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Abstract 

Evaluating conjunctive queries and solving constraint satisfaction prob- 
lems are fundamental problems in database theory and artificial intelli- 
gence, respectively. These problems are NP-hard, so that several research 
eff'orts have been made in the literature for identifying tractable classes, 
known as islands of tractability, as well as for devising clever heuristics 
for solving efficiently real-world instances. 

Many heuristic approaches are based on enforcing on the given instance 
a property called local consistency, where (in database terms) each tuple in 
every query atom matches at least one tuple in every other query atom. 
Interestingly, it turns out that, for many well-known classes of queries, 
such as for the acyclic queries, enforcing local consistency is even sufficient 
to solve the given instance correctly. However, the precise power of such 
a procedure was unclear, but for some very restricted cases. 

The paper provides full answers to the long-standing questions about 
the precise power of algorithms based on enforcing local consistency. In 
particular, the paper deals with both the general framework of tree projec- 
tions, where local consistency is enforced among arbitrary views defined 
over the given database instance, and the specific cases where such views 
are computed according to so-called structural decomposition methods, 
such as generalized hypertree width, component hypertree decomposi- 
tions, and so on. 

The classes of instances where enforcing local consistency turns out to 
be a correct query-answering procedure are however not efficiently recog- 
nizable. In fact, the paper finally focuses on certain subclasses defined in 
terms of the novel notion of greedy tree projections. These latter classes are 
shown to be efficiently recognizable and strictly larger than most islands 
of tractability known so far, both in the general case of tree projections 
and for specific structural decomposition methods. 



1 



1 Introduction 



1.1 Acyclic Conjunctive Queries 

Answering conjunctive queries to relational databases is a basic problem in 
database theory, and it is equivalent to many other fundamental problems, such 
as conjunctive query containment and constraint satisfaction. Recall that con- 
junctive queries are defined through conjunctions of atoms (without negation), 
and are known to be equivalent to Select-Project- Join queries. The problem of 
evaluating such queries is NP-hard in general, but it is feasible in polynomial 
time on the class of acyclic queries (we omit "conjunctive," hereafter), which 
was the subject of many seminal research works since the early ages of database 
theory (see, e.g., [7]). This class contains all queries Q whose associated query 
hypergraph Hq is acyclic0 where "Hq is a hypergraph having the variables of Q 
as its nodes, and the (sets of variables occurring in the) atoms of Q as its hyper- 
edges. It is well known that acyclic queries enjoy a number of highly desirable 
properties, recalled next. 

First, acyclic queries can he efficiently solved. From any acyclic query, we can 
build (in linear time) a join tree [8j , which is a tree whose vertices correspond 
to the various atoms and where the subgraph induced by vertices containing 
any given variable is a tree. According to Yannakakis's algorithm |52], Boolean 
acyclic queries can be evaluated by processing any of their join trees bottom-up, 
by performing upward semijoins between the relations associated with the query 
atoms, thus keeping the size of the intermediate relations small. At the end, 
if the relation associated with the root of the join tree is not empty, then the 
answer of the query is not empty. For non-Boolean queries, after the bottom-up 
step described above, one can perform the opposite top-down step by filtering 
each child vertex from those tuples that do not match with its parent tuples. The 
filtered database, called full reducer, then enjoys the global consistency property: 
every tuple in every relation participates in some solution. By exploiting this 
property, all solutions can be computed with a backtrack-free procedure (i.e., 
with backtracks used to look for further solutions, and never caused by wrong 
choices) . 

Second, the class of acyclic instances coincides with the class of queries where 
local consistency entails global consistency. We say that local (also, pairwise) 
consistency holds if the relations associated with the query atoms are not empty 
and we do not miss any tuple by taking semijoins between any pair of them. 
The acyclic instances that fulfil this property also fulfil the global consistency 
property [7]. Note that local consistency may easily be enforced by taking the 
semijoins between all pairs of atoms until a fixpoint is reached. Therefore, in 
abstract terms, any acyclic query can be answered by means of "local" compu- 
tations only, without any additional knowledge about the whole structure, in 

^For completeness, observe that different notions of fiypergraph acyclicity have been pro- 
posed in the literature. This paper follows the standard definition of acyclic conjunctive 
queries, so that hypergraph acyclicity always refers to the most liberal notion, known as 
a-acyclicity [18) . 
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particular without computing any join tree of the query. In addition, and more 
surprisingly, if a class of instances can be answered by means of this approach, 
then it only contains acyclic instances 

Finally, acyclicity is efficiently recognizable. Deciding whether a hypergraph 
is acyclic is feasible in linear time |50| . and also in deterministic logspace. In 
fact, this latter property follows from the fact that hypergraph acyclicity belongs 
to SL [23], and that SL is equal to deterministic logspace [45]. Note that, in the 
light of this property and the first one above, these queries identify a so-called 
(accessible) "island of tractability" for the query answering problem [35] ■ 



1.2 Generalization of Acyclicity 

Queries arising from real applications are hardly precisely acyclic. Yet, they 
are often not very intricate and, in fact, tend to exhibit some limited degree of 
cyclicity, which suffices to retain most of the nice properties of acyclic ones. 

Several efforts have been spent to investigate invariants that are best suited 
to identify nearly-acyclic hypergraphs, leading to the definition of a number of 
so-called (purely) structural decomposition-methods, such as the (generalized) 
hypertree |24j . fractional hypertree [35 , spread-cut [14 , and component hyper- 
tree |26] decompositions. These methods aim at transforming a given cyclic 
hypergraph into an acyclic one, by organizing its edges (or its nodes) into a 
polynomial number of clusters, and by suitably arranging these clusters as a 
tree, called decomposition tree. The original problem instance can then be eval- 
uated over such a tree of subproblems, with a cost that is exponential in the 
cardinality of the largest cluster, also called width of the decomposition, and 
polynomial if this width is bounded by some constant. 

Despite their different technical definitions, there is a simple mathematical 
framework that encompasses all the above decomposition methods, which is 
the framework of the tree projections [29] . In this setting, a query Q is given 
together with a set V of atoms, called views, which are defined over the variables 
in Q. The question is whether (parts of) the views can be arranged as to form 
a tree projection (playing the role of a decomposition tree), i.e., a novel acyclic 
query that still "covers" Q. By representing Q and V via the hypergraphs T-Lq 
and Hv, where hyperedges one-to-one correspond with query atoms and views, 
respectively, the tree projection problem reveals its graph-theoretic nature. For 
a pair of hypergraphs Hi, 712, let Hi < H2 denote that each hyperedge of Hi 
is contained in some hyperedge of H2- Then, a tree projection of Hq w.r.t. Hv 
is any acyclic hypergraph Ha such that Hq < Ha < Hv- If such a hypergraph 
exists, then we say that the pair of hypergraphs (Hq, Hv) has a tree projection. 

Example 1.1 Consider the conjunctive query 

Qo : ri{A, B, C) A r2(A, F) A rs^C, D) A ri{D, E, F)A 
r5iE,F,G)Are{G,H,I)Arr{I,J)ArsiJ,K), 

■^Actually, this classical result holds only for queries where every relation symbol is used 
at most once. The precise power of local computations in the general case is identified in this 
paper (for acyclic queries too). 
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Figure 1: A tree projection "Ka of w.r.t. T^Vo! On the right: A join tree JTa 
for Hq. 

whose associated hypergraph Hq(, is depicted in Figure [U together with other 
hypergraphs that are discussed next. 

To answer Qo, assume that a set Vq of views is available comprising some 
views, called query views, playing the role of query atoms, plus four additional 
views. The set of variables of each view is a hyperedge in the hypergraph T-Lvo 
(query views are depicted as dashed hyperedges). In the middle between T-Lq^ 
and "Hvo i Figure [T] reports the hypergraph Ha which covers T-Lq^ , and which 
is in its turn covered by "Hvo— e.g., {C,D} C {A,B,C,D} C {A,B,C,D,H}. 
Since Ha is in addition acyclic (just check the join tree JTa in the figure). Ha 
is a tree projection of Hq„ w.r.t. H\>g. < 

Observe that, in the tree projection framework, views can be arbitrary, i.e, 
they do not depend on the specific conjunctive query Q, and can be reused to 
answer different queries. In particular, views may be the materialized output of 
any procedure over the database, possibly much more powerful than conjunctive 
queries. Moreover, it is known and easy to see that any decomposition method 
based on clustering subproblems can be viewed as an instance of this general 
setting, identifying a specific set of views to answer a given query Q efficiently 
(see Section [2] and Section |4]). 

For example (see, e.g., [H |30l [32]), for any fixed natural number k, the 
generalized hypertree decomposition method associates with any query Q a set 
v-hwk{Q) of views, containing one distinct view over each set of variables that 
can be covered by at most k query-atoms. For any hypergraph "H, let be 
the hypergraph whose hyperedges are all possible sets obtained by the union 
of at most k hyperedges of H, and notice that Hq is precisely the hypergraph 
associated with v-hwk{Q) ■ A query Q has generalized hypertree width bounded 
by k if, and only if, there is a tree projection of 'Hq w.r.t. Hq. 

For another example, we recall the tree decomposition method [T71I1T], based 
on the notion of treewidth [42] , which is the most general decomposition method 
over classes of hounded-arity queries (see, e.g, [22l[34]). For any fixed natural 
number k, the method defines the set v-twk{Q) of views containing one distinct 
view over each set of at most fc + 1 variables occurring in Q. Let Hq be the 
hypergraph associated with v-twk{Q), i.e., the hypergraph whose hyperedges 
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are all possible sets of at most k + 1 variables. Then, a query Q has treewidth 
bounded by k if, and only if, there is a tree projection of Hq w.r.t. Hq (see. 

In fact, the notion of tree projection is quite natural and may be exploited 
in different applications where hypergraphs naturally represent structural prop- 
erties of input instances. For example, Adler [3] pointed out that the notion 
of acyclicity for a conjunctive query with negation Q, as defined in [19', can 
be immediately recast as the existence of a tree projection of Hq w.r.t. 'Hq+, 
where the hyperedges of 'Hq+ are the sets of variables occurring in the positive 
atoms of Q only, while the hyperedges of Hq correspond to all atoms, including 
the negative ones. Then, we can generalize this notion to obtain larger classes 
of tractable instances, by saying that a query with negation Q has generalized 
hypertree width at most k if the pair {Hq,Hq+) has a tree projection. Indeed, 
following the same reasoning as in [12], it is easy to see that, given such a tree 
projection, the query Q can be evaluated in polynomial time. 

1.3 Open Questions About Tree Projections and Struc- 
tural Decomposition Methods 

The interest on the tree projection framework goes back to the eighties, when 
it was noticed that queries that admit a tree projection can be evaluated in 
polynomial time [29] (see, also, [H]). Thus, tree projections smoothly preserve 
the first crucial property of acyclic queries discussed in Section fl.il Our knowl- 
edge on the preservation of the other properties of acyclic queries was less clear, 
instead. In fact, the following two questions have been posed in the literature 
for the general tree projection framework as well as for structural decomposition 
methods specifically tailored to deal with classes of queries without a fixed arity 
bound. Such questions were in particular open for the generalized hypertree 
decomposition method, which on classes of unbounded-arity queries is a natural 
counterpart of the tree decomposition method. 

(Ql) What is the precise power of local-consistency based algorithms? 

This question was firstly raised in [7] and specifically for the general case of tree 
projections in [4^, and remained open so far, despite it was attacked via differ- 
ent approaches and proof techniques, which gave some partial results, reported 
below. 

Let V be an arbitrary set of views, which also contains the query views 
representing the atoms of a given query Q. Let lc(V, DB) denote that the views 
in V evaluated over a database DB enjoy the local consistency property, i.e., they 
are non-empty and we do not miss any tuple by taking the semijoin between any 
pair of views. Let red(V,DB) be the reduct of DB according to V, computed 
by taking all possible semijoins until a fixpoint is reached. More precisely, 
red(V,DB) is the (set-inclusion) maximal subset of DB such that lc(V,DB) 
holds, or red(V,DB) = 0, whenever such a maximal subset does not exist. Let 
gc(V, DB, Q) denote that the global consistency property holds, i.e., every tuple 
in every query view (evaluated over DB) participates in the query answer. Let 
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^ denote that the answer of Q on DB is not empty. Then, the picture 
emerging from the hterature is as follows: 

- The existence of a tree projection of Hq w.r.t. Hv entails that, VDB, lc(V, DB) =J> 
gc(V,DB,(5) [44j. In words, the existence of a tree projection is a sufficient 
condition for the global consistency property to hold, whenever the database is 
local consistent. Thus, if a tree projection exists, then both deciding whether 
the query is not empty and computing a query answer (if any) are feasible 

in polynomial time, by enforcing local consistency. Observe that such a pro- 
cedure is based on local computations only, and hence there is no need to 
actually compute a tree projection. This is a remarkable result, since com- 
puting a tree projection is instead not feasible in polynomial time, unless 
P = NP |26| . It was conjectured that the existence of a tree projection is also 
a necessary condition for having this property . 

- Consider classes of bounded-arity queries Q, and the tree decomposition 
method, hence the view set v-twk{Q) with its associated hypergraph 'Hg. 
For any database DB, let d-twk{Q ,DB) be the database obtained by as- 
sociating each view in v-twk{Q) with the cartesian product of the set of 
constants that variables occurring in it may take. It is known that VDB, 
{red{v-twk{Q), d-twk{Q,TiB)) ^ 0) ^ (Q°^ ^ 0) if 15;, and only if [6], there 
is a tree projection of T-Lqi w.r.t. , for some core Q' of Q. In fact, the result 
holds for any query Q' that is homomorphically equivalent to Q, denoted by 
Q' ~hom Q (instead of just for a core, which is any smallest one). This result 
provides a necessary and sufficient condition for query answering via local 
consistency, without computing any tree-decomposition of such a subquery 
Q', which would be an NP-hard task |15j . Observe that the necessary con- 
dition holds only for structures of hounded arity, and the result provides only 
information about the decision problem (i.e., checking whether the answer is 
empty or not). 

- For the general case of queries Q with unbounded arity, consider the gen- 
eralized hypertree decomposition method and hence the view set v-hwk{Q), 
containing one distinct view over each set of variables that can be covered 
by at most k query-atoms, and its associated hypergraph "Hq. Moreover, for 
any database DB, let d-hwk{Q,DB) be the database obtained by associating 
each view in v-twk{Q) with the (natural) join of all query- views over which 
it is defined. It is known that VDB, {red{v-hwkiQ), d-hwkiQ,BB)) 7^ 0) =4> 
{Q°° 7^ 0) if there exists a tree projection of Hq' w.r.t. Hq,, where Q' is 
any query such that Q' Whom Q [H]- Note that, when we focus on general- 
ized hypertree decompositions, instead of looking at views in v-hwk{Q) and 
tree projections, we may directly look at the consistency between every pair 
of sets of k atoms, also called k-local consistency. Hence, the result states 
a sufficient condition for deciding whether the answer is empty or not by 
enforcing fc-local consistency, (again) without actually identifying such a sub- 
query Q' and without computing a generalized hypertree decomposition of 
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Q', which are both NP-hard tasks. It was open whether the condition is also 
necessary [12]. Moreover, as in the above point about tree decompositions, 
the relationship with global consistency and hence with the related problem of 
computing solutions was missing. 

From these results, it emerges that the precise power of local-consistency based 
computations and of their relationships with tree projections and with the other 
structural decomposition methods (in particular, tree decompositions and gen- 
eralized hypertree decompositions) was far from being clear: Is it possible that 
there are queries where such local computations do work even if no decomposi- 
tion (or tree projection) exists? 

For instance, from the above recent results based on homomorphically equiv- 
alent subqueries for tree decompositions and generalized hypertree decomposi- 
tions, one may deduce that the mentioned conjecture in [321131] (i.e., that local 
consistency implies global consistency if, and only if, a tree projection of the 
query hypergraph exists) may not hold, in general. This is because in the case 
of queries with multiple occurrences of the same relation symbol, the concept 
of core of the query plays a crucial role jl5j , as it should be clear from the next 
example. 

Example 1.2 Consider the following queries: 

Qi : r{A, B) A r(B, C) A r{C, D) A r{D, A) 
O2 : r(A, B) A r(B, C) A r{D, C) A r{A, D) 
Qs : riB, A) A r(C, B) A r(C, D) A r{D, A) 

These queries are completely equivalent as far as their hypergraphs are con- 
cerned, since Hq^ — 'Hq2 — ^Qs- However, Q\ is already a core, while 
a core of Q2 (resp., Qz) is the acyclic sub-query r(A, _B) A r{B,C) (resp., 
r{C,D) A r{D,A)). Thus, by focusing on Q2 and Q3 rather than on their 
cores, we could overestimate their intricacy. < 

However, the above conjecture might still hold in the original setting con- 
sidered in [35], where all relation symbols in a query are distinct. 

(Q2) Are there unexplored islands of tractability based on tree pro- 
jections? An island of tractability in the tree projection framework is a class 
C of pairs (Q,V) that can be efficiently recognized, and such that Q can be 
efficiently evaluated on every database, by possibly exploiting the views that 
are available in V. 

Many specializations of tree projections, such as tree decompositions [42] . 
hypertree decompositions [24], component decompositions [26], and spread-cuts 
decompositions [14) . define islands of tractability whenever some fixed bound 
is imposed on their widths. This is also the case for fractional hypertree de- 
compositions |35| . whenever the resources sufficient for computing their 0(11]^) 
approximation [40] are used as available views. However, this is not the case 



7 



for general tree projections. Indeed, while Goodman and Shmueli [33] ob- 
served that queries that admit a tree projection can be evaluated in poly- 
nomial time, Gottlob et al. [15] proved that checking whether a tree projec- 
tion exists or not is an NP-hard problem. Hence, the class Ctp = {(Q, V) | 
Hq has a tree projection w.r.t. Tiv}-, which includes all the above mentioned 
islands of tractability, is not an island of tractability in its turn. In fact, in 
addition to the above result, we also know that: 

- Deciding whether a tree projection of T-Lq w.r.t. Hq (corresponding to a tree 
decomposition) exist is feasible in time 0(2'^'^ x n), where n is the size of 
Hq, k is the treewidth, and c is a constant [S], hence in linear time for a fixed 
width k. 

- The problem remains NP-hard for the case of generalized hypertree decom- 
positions, that is, when we have to decide the existence of a tree projection 
of Hq w.r.t. Hq, even if fc is a fixed number (greater than 2) [55]. 

Moreover, recall that the sufficient conditions we have discussed in the previ- 
ous point (Ql) do not identify (accessible) islands of tractability, because their 
recognition problems are NP-hard, too. Such conditions are particularly use- 
ful in those settings where it is intractable to compute any tree projection, so 
that answers are computed via procedures enforcing local consistency. However, 
having a tree projection at hands allows queries to be evaluated more efficiently 
w.r.t. techniques based on "blind" local-consistency enforcing. Intuitively, by 
having such a projection Ha and hence a join tree for Ha, we are able to ex- 
ploit all the well known algorithms developed for acyclic queries. In particular, 
in this approach, only the views occurring in the join tree are involved in the 
query evaluation, while all available views should be used if no tree projection 
is available. Furthermore, the number of semijoin operations to be performed 
having the join tree is at most the number of nodes in such a tree and does not 
depend on the database, as it happens instead while enforcing local consistency. 
Therefore, a natural question is whether there is any subclass of Ctp, at least 
including all the tractable classes mentioned above, which identifies an actual 
island of tractability where tree projections can be computed efficiently. 

1.4 Contribution 

In this paper, we provide a clear picture of the power of tree projections and 
structural decomposition methods, by answering the two questions illustrated 
above. 

It is worthwhile noting that our answers, summarized below, find applica- 
tions in all those problems that can be solved efficiently on acyclic and quasi- 
acyclic instances, even outside the Database area. In particular, our results can 
be exploited immediately for solving Constraint Satisfaction Problems (CSPs) 
where constraints are represented as finite relations encoding allowed tuples of 
values (see, e.g., [22]). 
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(Ql) The first achievement of this paper is to solve the long-standing question 
about the power of local-consistency based computations, by addressing in the 
analysis both the decision problem of checking whether the query is not empty, 
and the problem of characterizing a necessary and sufficient condition guaran- 
teeing that local consistency entails global consistency, which is useful from the 
query answering perspective. 

Concerning the decision problem of checking whether the query has a solu- 
tion, we show that the sufficient conditions identified for some specializations 
of tree decompositions are also necessary, even in the most general framework. 
However, the technical machinery needed for obtaining our results is quite dif- 
ferent from the one used in [5] for tree decompositions, which does not work 
when we have arbitrary signatures or arbitrary views. Our first contribution is 
to show that: 



The following are equivalent: 

(1) For every database DB, lc(V,DB) entails Q^"" ^ 0. 

(2) There is a subquery Q' «iiom Q for which (T-Cq'^Hv) has a tree 
projection. 



Our second contribution is then to single out the (stronger) conditions un- 
der which local consistency entails global consistency. We show that finding a 
necessary and sufficient condition requires to exploit possible endomorphisms of 
the query. It emerged that to characterize when, at local consistency, an atom 
p contains all, and only, the correct tuples of the query Q projected over the 
variables vars{p) = {Xi, X„} of p, we must look for tree projections of some 
"output-aware" substructures of Q. We say that {Xi, X„} is tp-covered in Q 
(w.r.t. V) if there is a tree projection of (Hqp,Hv), where Qp is a core of the 
novel query Q A r[Xi , X„), in which r is a fresh relation symbol. Intuitively, 
r is used to force any such a core to contain the desired variables {Xi, X„}. 
It turns out that, for having global consistency guaranteed by local consistency, 
for each query atom p, a tree projection T-Lp of such a Qp must exist. 

The following are equivalent: 

(1) For every database DB, lc(V,DB) entails gc(V,DB,Q). 

(2) For each query atom q, vars{q) is tp-covered in Q. 

Thus, if (2) holds and one is interested in computing query answers over 
output variables included in some query atom, then all solutions are immediately 
available. In fact, the above result comes in the paper as a specialization of a 
more general result dealing with those cases where one is interested in computing 
answers over an arbitrary subset of variables covered by some available view. 
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Moreover, observe that in the above condition different tree projections for 
different query atoms are allowed. That is, global consistency can hold even if 
there is no tree projection that is able to cover all query atoms at once. However, 
if every relation symbol is used at most once in the query, it is easy to see that 
(2) is equivalent to requiring that a tree projection of the whole query exists. 
Hence, the conjecture of [33] about the necessity of having a tree projection of 
the query does not hold in general, but it does hold for such a restricted setting 
(in fact, the one considered in [12 )• 

Actually, in this informal statement we have implicitly assumed databases 
where views are not more restrictive than the query; otherwise, using such views 
may clearly lead to missing some tuple in the query answer. Note that this 
condition trivially holds whenever views are computed from parts of the query 
(i.e., they are in fact subqueries), which happens in structural decomposition 
methods. However, this is not necessarily true if one would like to exploit 
existing materialized views. Anyway, we show that soundness of query answers 
is always guaranteed. If views are too restrictive w.r.t. Q, then we may just 
miss completeness. 

(Ql: Application to Decomposition Methods) As a direct consequence 
of our contribution w.r.t. question (Ql), we get in a unique result the gener- 
alization of all tractability results known for purely structural decompositions 
methods (because all of them are specializations of the notion of tree projec- 
tions). Moreover, we provide the precise characterization of the power of k-local 
consistency for classes of queries without a fixed bound on the arity, which was 
missing in [6] and jl2j . 

In particular, we provide a necessary and sufficient condition such that k- 
local consistency entails global consistency, which is useful for computing solu- 
tions. Furthermore, concerning the decision problem (query non-emptiness), we 
show that the sufficient condition identified in [12] is in fact necessary, too: 



The following are equivalent: 

(1) For every database DB, red(t;-/izwfe((5), (i-/iWfc((5, DB)) ^ en- 
tails Q°=^ 7^ 0. 

(2) Q has a core having generalized hypertree width at most k. 

We point out that the result is not an immediate corollary of the previous 
one about tree projections (by setting Hv = ^g, where Hq is the hypergraph 
where each hyperedge is the set of variables occurring in some group of at most 
k query-atoms). Indeed, let Q' be any core of Q, and recall that Q' may be 
much smaller than Q. Thus, the set of views that can be used to form a fc-width 
generalized hypertree decomposition of Q' only come from groups of at most k 
atoms occurring in Q'. It follows that this set can be much smaller than V/c, 
which is built from the full query Q. For another difference between our general 
result and the above one, note that the database d-hwk{Q, DB) for the available 
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views, over which local consistency is considered, is functionally determined by 
the relations of query atoms in DB (instead of being almost arbitrary). 

Note that, for fc = 1, local consistency is required to hold only on the query 
views playing the role of the original query atoms. We thus obtain the precise 
characterization of the power of local consistency in acyclic queries, generalizing 
the classical result in [7] given for queries without multiple occurrences of the 
same relation symbol: for every database DB, local consistency (of query views) 
entails ^ if, and only if Q has an acyclic core. 

(Q2) As discussed above, the classes of instances where enforcing local con- 
sistency is a correct query- answering procedure are not efficiently recognizable. 
Therefore, it is natural to look for subclasses that are efhciently recognizable 
and that are strictly larger than the islands of tractability known so far. Ad- 
dressing this issue is the second main achievement of the paper. To this end, we 
exploit the game-theoretic characterization of tree projections in terms of the 
Captain and Robber game [HD] . The game is played on a pair of hypergraphs 
(7^1,7^2) by a Captain controlling, at each move, a squads of cops encoded as 
the nodes in a hyperedge h G edges {H2), and by a Robber who stands on a node 
and can run at great speed along the edges of Hi, while being not permitted to 
run trough a node that is controlled by a cop. In particular, the Captain may 
ask any cops in the squad h to run in action, as long as they occupy nodes that 
are currently reachable by the Robber, thereby blocking an escape path for the 
Robber. While cops move, the Robber may run trough those positions that are 
left by cops or not yet occupied. The goal of the Captain is to place a cop on 
the node occupied by the Robber, while the Robber tries to avoid her capture. 
The Captain has a winning strategy if, and only if, there is a tree projection of 
Hi w.r.t. H2- Then, 

► We define the notion of greedy strategies, which are winning strategies for 
the Captain, possibly non-monotone, where it is required that all cops 
available at the current squad h and reachable by the Robber enter in 
action. If all of them are in action, then a new squad h' is selected, again 
requiring that all the active cops, i.e., those in the frontier, enter in action. 
In the Captain and Robber game, it is known that there is no incentive 
for the Captain to play a strategy that is not monotone [30 . Instead, by 
focusing on greedy strategies, we can exhibit examples where there exists 
non-monotone winning strategies but no monotone winning one. 

► We show that greedy strategies can be computed in polynomial time, and 
that based on them (even on non-monotone ones) it is possible to con- 
struct, again in polynomial time, tree projections, which are called greedy. 
Therefore, the class Cgtp C Ctp of all greedy tree projections turns out to 
be an island of tractability. 

► Finally, we show that Cgtp properly includes most previously known is- 
lands of tractability (based on structural properties) , precisely because of 
the power of non-monotonic strategies. In particular, the novel notion of 
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greedy tree projections allows us to define new islands of tractability from 
any known structural decomposition method, such as the greedy (gen- 
eralized) hypertree decomposition or the greedy component decomposition, 
which are tractable and strictly more powerful than their original versions. 

1.5 Organization 

The paper is organized as follows. Section [5] illustrates some basic notions and 
concepts. The characterization of the power of local consistency is given in Sec- 
tion [3l while its application to structural decomposition methods is reported in 
Sectional Islands of tractability for tree projections are singled out in Section[Sl 
and an application of the results to structures having "small" arities is presented 
in Section [S] A few further remarks and open issues are discussed in Section [71 

2 Preliminaries 

Hypergraphs and Acyclicity. A hypergraph H is a pair {V,H), where V is 
a finite set of nodes and H is a set of hyperedges such that, for each h € H, 
h CV. li\h\ = 2 for each (hyper)edge h ^ H, then H is a graph. For the sake of 
simplicity, we always denote V and H by nodes{'H) and edges{'H), respectively. 

A hypergraph Ti, is acyclic (more precisely, a-acyclic |18) ) if, and only if, it 
has a join tree |5]. A join tree JT for a hypergraph H is a tree whose vertices 
are the hyperedges of H such that, whenever a node X G V occurs in two 
hyperedges hi and /i2 of H, then hi and /12 are connected in JT, and X occurs 
in each vertex on the unique path linking hi and /i2. In words, the set of vertices 
in which X occurs induces a (connected) subtree of JT. We will refer to this 
condition as the connectedness condition of join trees. 

Tree Decompositions. A tree decomposition [42] of a graph G is a pair (T, x), 
where T — {N, E) is a tree, and x is a labeling function assigning to each vertex 
w G a set of vertices xi"") ^ nodes{G), such that the following conditions 
are satisfied: (1) for each node Y G nodes{G), there exists p G N such that 
Y g x{p)i (2) for each edge {A, F} e edges (G), there exists p ^ N such that 
{X,Y} C xip); and (3) for each node Y e nodes{G), the set {p e N \ Y G 
xip)} induces a (connected) subtree of T. The width of {T,x) is the number 
maxpgAr(|x(p)| - 1). 

The Gaifman graph of a hypergraph H is defined over the set nodes{'H) of 
the nodes of and contains an edge {A, F} if, and only if, {X, F} C h holds, 
for some hyperedge h G edges{'H). The treewidth of Ti is the minimum width 
over all the tree decompositions of its Gaifman graph. Deciding whether a given 
hypergraph has treewidth bounded by a fixed natural number k is known to be 
feasible in linear time [9]. 

(Generalized) Hypertree Decompositions. A hypertree for a hypergraph 
y, is a triple (T, x. A), where T = {N,E) is a rooted tree, and x and A are 
labeling functions which associate each vertex p G N with two sets x{p) ^ 
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nodesin) and \{p) C edges{n). If T' = {N',E') is a subtree of T, we define 
x{T') = UuGAT' xl'^)- In the following, for any rooted tree T, we denote the set 
of vertices TV of T by vertices{T), and the root of T by root{T). Moreover, for 
any p & N,Tp denotes the subtree of T rooted at p. 

A generalized hypertree decomposition [33] of a hypergraph ?^ is a hypertree 
HD = (T, X, A) for H such that: (1) for each hyperedge h € edges{'H), there 
exists p G vertices (T) such that h C x(p); (2) for each node Y € nodes (H), the 
set {p e vertices{T) \ Y e x(p)} induces a (connected) subtree of T; and (3) 
for each p G vertices{T), xip) Q nodes(\(p)). The width of a generalized hyper- 
tree decomposition (r,x. A) is rnaXp(ZTjertices(T)\^{p)\- The generalized hypertree 
width ghw{'H) of H is the minimum width over all its generalized hypertree 
decompositions . 

A hypertree decomposition |24j of 'H is a generalized hypertree decomposition 
HD = (T, X, A) where: (4) for eachp e vertices{T), nodes {\{p))rix{Tp) C x(p)- 
Note that the inclusion in the above condition is actually an equality, because 
Condition (3) implies the reverse inclusion. The hypertree width hw{'H) of T-l is 
the minimum width over all its hypertree decompositions. Note that, for any 
hypergraph %, it is the case that ghw{'H) < hw{'H) < 3 x ghw['H) + 1 [5]. 
Moreover, for any fixed natural number fc > 0, deciding whether hwijH.) < fc is 
feasible in polynomial time (and, actually, it is highly-parallelizable) [23], while 
deciding whether ghw{'H) < fc is NP-complete [55]. 

Tree Projections. For two hypergraphs %i and H2, we write %i < H2 if, 
and only if, each hyperedge of Hi is contained in at least one hyperedge of 'H2- 
Let Hi < 7^2; then, a tree projection of Hi with respect to H2 is an acyclic 
hypergraph Ha such that Hi < Ha < H2- Whenever such a hypergraph Ha 
exists, we say that the pair of hypergraphs (Hi,H2) has a tree projection. 

Note that the notion of tree projection is more general than the above men- 
tioned (hyper)graph based notions. For instance, consider the generalized hy- 
pertree decomposition approach. Given a hypergraph H and a natural number 
fc > 0, let H'^ denote the hypergraph over the same set of nodes as H, and 
whose set of hyperedges is given by all possible unions of fc edges in H, i.e., 
edges{H'') = {/ii U /12 U • • • U /ifc | {hi, /12, ■ • ■ , hk} Q edges{H)}. Then, it is well 
known and easy to see that H has generalized hypertree width at most fc if, and 
only if, there is a tree projection for {H,H^). 

Similarly, for tree decompositions, let H*'^ be the hypergraph over the same 
set of nodes as H, and whose set of hyperedges is given by all possible clusters 
B C nodes{H) of nodes such that \B\ < fc + 1. Then, H has treewidth at most 
fc if, and only if, there is a tree projection for {H,H'^^). 

Relational Structures and Homomorphisms. Let U and X be disjoint 

infinite sets that we call the universe of constants and the universe of variables, 
respectively. A (relational) vocabulary r is a finite set of relation symbols of 
specified (finite) arities. A relational structure A over r (short: r-structure) 
consists of a universe A C U L) X and, for each relation symbol r in r, of a 
relation C A^, where p is the arity of r. 

Let A and B be two r-structures with universes A and B, respectively. A 
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homomorphism from A to B is a mapping h : A i-^ B such that h{c) = c for 
each constant c in A D U, and such that, for each relation symbol r in t and 
for each tuple (ai, . . . , Op) G r*, it holds that {h{ai), . . . , h{ap)) G For any 
mapping h (not necessarily a homomorphism), h({ai, . . . , Op)) is used, as usual, 
as a shorthand for (/i(ai), . . . , h(ap)). 

A T-structure A is a substructure of a r-structure B if A C and C r'®, 
for each relation symbol r in r. 

Relational Databases. Let t be a given vocabulary. A database instance (or, 
simply, a database) DB over £) C Z// is a r-structure DB whose universe is the 
set D of constants. For each relation symbol r in r, is a relation instance 
(or, simply, relation) of DB. Sometimes, we adopt the logical representation of 
a database [511 [1], where a tuple (oi, Op) of values from D belonging to the 
yO-ary relation (over symbol) r is identified with the ground atom r(ai, ...,ap). 
Accordingly, a database DB can be viewed as a set of ground atoms. Unless 
otherwise stated, we implicitly assume that databases are finite. 

Conjunctive Queries. A conjunctive query Q consists of a finite conjunction 
of atoms of the form ri(ui) A ••• A ^^(um), where ri,...,rm (with m > 0) 
are relation symbols (not necessarily distinct), and Ui, Um are lists of terms 
(i.e., variables or constants). The set of all atoms occurring in Q is denoted 
by atoms{Q). For a set of atoms A, vars{A) is the set of variables occurring 
in the atoms in A. For short, vars{Q) denotes var s {atoms {Q)). We say that 
Q is a simple query if every atom is over a distinct relation symbol. Given a 
database DB over Z?, denotes the set of all answers of Q on DB, that is, all 
substitutions 9 : vars{Q) ^ D such that for each 1 < i < m, 0'(rQ. (ui)) e DB, 
where 9'{t) = B{t) if i G vars{Q) and 6'{t) = t otherwise (i.e., if the term t is a 
constant). 

Note that any conjunctive query Q can be viewed as a relational structure 
Q, whose vocabulary tq and universe Uq are the set of relation symbols and 
the set of terms occurring in its atoms, respectively. For each symbol G tq, 
the relation contains a tuple of terms u, for any atom of the form r, (u) G 
atoms{Q) defined over r^. In the special case of simple queries, every relation 
r^ of Q contains just one tuple of terms. According to this view, elements in 
Q^^ are in a one-to-one correspondence with homomorphisms from Q to DBg , 
where the latter is the (maximal) substructure of DB over the (sub)vocabulary 
TQ. Hereafter, we adopt this view but, for the sake of presentation, we identify 
queries and databases with their relational structures, i.e., we use directly Q 
and DB in place of Q and DBg. 

For any given set S of variables, we denote by Q°^[S] the restriction of the 
(substitutions/)homomorphisms in Q^^ over the variables in S. For the extreme 
case where S — ^, define /it^ue to be the restriction of any homomorphism over 
the empty set. Then, Q°^[0] = {htme} if Q^"" + 0, and Q°='[0] = if = 0. 
If a is an atom, then Q°^\a\ denotes Q°^\vars(a)\. 

Note that any atom a can be viewed as a one-atom query, so that aP^ is the 
set of all the homomorphisms from a to DB, restricted to vars{a) (i.e., projecting 
out possible constants occurring in a). For a set A of atoms, we denote by A°^ 
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the set {0°=^ \ae A}. 

A core of Q is a query Q' such that: (1) atoms{Q') C atoms{Q); (2) there 
is a homomorphism from Q to Q'\ and (3) there is no query Q" satisfying (1) 
and (2) such that atoms{Q") C atoms{Q'). Equivalently, in terms of relational 
structures, Q' is a minimal substructure of Q such that (2) holds. The set of all 
the cores of Q is denoted by cores(Q). Elements in cores((3) are isomorphic. 

Hypergraphs and atoms. There is a very natural way to associate a hyper- 
graph T-L\i = (TV, H) with any set V of atoms: the set N of nodes consists of all 
variables occurring in V; for each atom in V, the set H of hyperedges contains 
a hyperedge including all its variables; and no other hyperedge is in H . 

For a query Q, the hypergraph associated with atoms{Q) is briefly denoted 
by T-Lq . If Hq is a connected hypergraph, we say that Q is a connected query. 

3 The Power of Local Consistency 

Throughout the paper, we assume that Q is a conjunctive query and that V is 
a non-empty set of atoms, which we call views, such that vars{V) = vars{Q). 
Moreover, DB is a database over the vocabulary VS containing the relation 
symbols of query atoms and views. We require w.l.o.g. that every available 
view is over a specific relation symbol, which does not occur in the given query, 
and that the list of terms of every view does not contain any constant or re- 
peated variables (in fact, observe that from any given set of available views, one 
may immediately get a new set of views where these assumptions hold) . Note 
that, within this setting, each view w e V is uni vocally associated with a relation 
instance in DB, whose tuples are in a one-to-one correspondence with the homo- 
morphisms in tif^. Therefore, this relation instance will be simply denoted by 
w"^, and we freely use the term tuples interchangeably with homomorphisms, 
when we refer to its elements. 

Our first goal is to characterize the relationships between tree projections 
and certain consistency properties that hold for Q and V over some (or all) given 
databases. To this end, we need to state some preliminary notions and defini- 
tions, which will be illustrated by referring to the following running example. 

Example 3.1 Consider the following query (34, where all atoms are over the 
same binary relation symbol r: 

Q4 : r{A, B) A r{B, C) A r(A, C) A r{D, C) A r{D, B) A r{A, E) A r{F, E). 

A graphical representation of this query is reported in Figure [2l where edge 
orientation just reflects the position of the variables in query atoms. Moreover, 
consider the database DB4 shown in Figure [21 by focusing on the relation in- 
stance r°^^ . Then, it can be checked that the answers of Qa on DB4 are the 
homomorphisms hi, hio, which are also reported, in tabular form, in Figured! 

In this example, in order to answers Q4, we assume the availability of the 
set of views V4 = MA, B, C), V2{A, F), v^iA, B), Vi{A, C), v^{A, E), veiB, C), 
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Figure 2: The (hypergraph of the) query Q4, the tuples in the database DB4, 
and the answers in Q°^'^, in Example 13.11 



vt{D,B), vs{D,C), vg{F,E)}, and that the database DB4 includes a relation 
instance u;"^*, for each view w G V4. Note that, in the figure, such relation 
instances are identified by the list of variables on which the views are defined. 
< 

3.1 Consistency Properties and Views 

View Consistency. For a view w G V, we say that is view consistent 
w.r.t. Q if D Q°^[w]. For the set of views V, we say that is view 
consistent w.r.t. Q, if the property holds for each w £V. That is, views are not 
more restrictive than the query. 

Note that view consistency holds in general for all views initialized from 
subsets of query atoms, such as those employed in all known decomposition 
methods, such as (hyper)tree decompositions. However, we are also interested 
in a wider framework where views are completely arbitrary and may be avail- 
able from previous computations, possibly unrelated with the present query Q. 
Accordingly, we do not require that view consistency holds for such views, and 
we shall look for general results, which will be then smoothly inherited by more 
specific settings. 
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Example 3.2 Consider again the setting of Example 13. 1[ and in particular the 
views vi{A,B,C) and V2{A,B,C). Note that wi(A,B,C)°^* is a set of two 
homomorphisms, which are precisely those in the set Q°^'^ [{A, B , C}] of the 
answers of Q4 on DB4 projected over the variables in {A,B,C}. Therefore, 
Vi{A, B, C) is view consistent w.r.t. DB4. Similarly, it can be checked that the 
views i;3(A,B), Vi{A,C), v^{A,E), v&{B,C), vj{D,B), vs{D,C), and vq{F,E) 
are all view consistent w.r.t. DB4. 

Instead, V2{A^F) is not view consistent w.r.t. DB4, since V2{A^F)°^^ D 
Q'l'^''[{A, F]\ does not hold. For instance, V2{A,Fy^ does not include the 
homomorphism mapping both A and F to the constant ai. Hence, V"'^'' is not 
view consistent w.r.t. Q^. <\ 

Local Consistency. We say that V^"^ is locally (also, pairwise) consistent, 
denoted by lc(V, DB), if ^ and = {wAw')°''[w], for each {w, w'} C V. 

From any set of views and any instance DB, we may compute a subset of 
DB that is locally consistent. Let the reduct of DB according to V, denoted by 
red(V,DB), be the (set-inclusion) maximal subset of DB such that v^dCDB.v) jg 
locally consistent; or red(V, DB) — 0, whenever such a maximal subset does not 
exist. It is well known that the reduct can be computed as the unique fixpoint of 
a procedure consisting of semijoin operations over DB, which runs in polynomial 
time. It is easy to see that such a reducing procedure preserves the given query, 
unless the used views are more restrictive than the query, of course. In fact, 
computing a reduct is often used as a useful heuristic procedure in different areas 
of computer science, where the homomorphism problem underlying conjunctive 
query evaluation comes out — e.g., in constraint satisfaction problems (CSP), 
where such a procedure is known as generalized arc consistency [16]. Indeed, 
if the reduct is empty, we may safely conclude that there are no solutions; 
otherwise, we got anyway a smaller instance of the problem to deal with. 

Example 3.3 In the running example depicted in Figure [51 the set V4 of views 
and the database DB4 are such that V4 is locally consistent. Consider for in- 
stance the views vi{A, B, C) and ^3(^4, B), and observe that both {vi{A, B,C)A 
V3{A, 5))°^" [{A, B, C}] = VI (A, B, (7)°^'' and {v3{A, B)/\vi (A, B, C))°''^ [{A, B} 
V3{A,B)°^'^. Indeed, every tuple in the relation associated with either view 
matches with some tuple in the other view on the variables they have in com- 
mon, so that no tuple is missed by performing such semijoin operations. This is 
easily seen because vi{A, B,C)''''^[{A, B}] = vsiA^B)"""^ = {(ai, ^i), (02, &2)} 
(where these two tuples also identify the homomorphisms mapping (A, B) to 
(ai,6i) and to (02,62), respectively). <\ 

Query Views. In the seminal paper about local and global consistency in 
acyclic queries [7] , local consistency is enforced directly on the relations of query 
atoms, while we only consider (and possibly enforce) this property on views, in 
this paper. This is because that paper, as well as other related papers such 
as [29], uses a slightly different formal framework where every relation symbol 
may occur just once in a query, i.e., where only simple queries are considered. 
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In contrast with these classical papers, we do not assume anything about the 
query, which may contain multiple occurrences of the same relation symbol. 
This means that the same relation instance may be shared by different query 
atoms, and this feature plays a very relevant role, as it was first pointed out 
in |15) . In this tuple may be useful for some atom and useless for another 

one defined over the same relation symbol. It follows that local consistency 
cannot be enforced on the relations of the query atoms, because such a filtering 
procedure would lead to undesirable side effects (possibly deleting all tuples in 
the database, including the useful ones). 

Therefore, we always keep the "original" database relations untouched and 
we rather use suitable views, each one with its own database relation, to play 
the role of query atoms in the definition of consistency properties in general 
queries and in consistency enforcing procedures. Formally, we say that V is a 
view system (for Q) if it contains, for each atom q e atoms{Q), a view Wq (over 
a distinct relation symbol) with the same set of variables as q. These special 
views in V are called hereafter query views, and are denoted by views (Q). If Q' 
is a subquery of Q, views {Q') denotes the set of query views associated with its 
atoms. In the following, the set of available views V is assumed to be a view 
system for the given query Q, unless otherwise specified. 

Example 3.4 Consider again the setting of Example 13.11 and note that V4 is 
in fact a view system for Q4. Indeed, the views in the set {v3{A,B), V4{A,C), 
V5{A, E), vq{B, C), vt{D, B),vs{D, C),vq{F, E)} are in a one-to-one correspon- 
dence with the query atoms of Q4. For instance, V3{A,C) is the query view 
Wr(A,c)j with r{A, C) being a query atom of Q4. Hence, viewslQ^) = {v^^A, B), 
V4iA, C), V5{A, E), vq{B, C), v^{D, B), vsiD, C), vgiF, E)}, and V4 = vtews{Qi)U 
{vi{A,B,C),v2{A,F)}. < 

Observe that working with view systems instead that with arbitrary set of 
views is not a restrictive assumption, for our purposes. On the practical side, 
if some atom misses its associated query view Wq in the available views, one 
may just add a fresh view Wq to the views, with a corresponding relation in the 
database such that — q°^ . On the theoretical side, recall that we are dealing 
with consistency properties of Q and V, and with tree projections of {Hq, T-Lv)- 
In fact, such a tree projection exists only if the set of variables of every atom 
g in Q is covered by some view w £ V, i.e., vars{q) C vars{w). Therefore, 
whenever V is a set of "useful views," for each query atom q there must exist 
some view in V that may play the role of the query view Wq (after projecting 
it on vars{q)). However, requiring that query views belong to V simplifies the 
presentation and allows us to define consistency properties in a clean way. In 
particular, the role of query views is crucial in the following definition. 

Global Consistency. Informally, this is a highly desirable state of the database 
where query views contain all and only those tuples that can be returned by 
query answers. In this case, an answer of the query can be computed in poly- 
nomial time: for each query view Wq, select one tuple h in the relation Wq^ that 
is univocally associated with Wq in DB, modify this relation so that Wq^ = {h}, 
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and propagate this choice by enforcing again local consistency (see Section H] for 
more results and discussions about the problem of computing answers). 

Observe that the classical definition, which states the above property for the 
relations of query atoms, is not useful whenever any relation symbol r is shared 
by some query atoms (because we miss the information relating any tuple in 
with those atoms where the tuple participates in some answer). By using query 
views instead of query atoms, no confusion may arise, and we get the desired 
extension of the classical definition given (in the literature discussed above) for 
simple queries. 

We say that a database DB is globally consistent with respect to Q and V, 
denoted by gc(V,DB,(5), if w^^ = Q^'^M (which is also equal to Q^^'lwq]), for 
each q G atoms{Q), where Wq is the query view associated with q. 

Example 3.5 Let us focus on the query views in viewsiC^^). Consider for in- 
stance the view v^{A. B) e views{Q4) (associated with the query atom r{A, B)), 
and note that 173 (A, 5)°=^'' = Q^^'-^HA, B}]. That is, the answers of Qi on DB4 
projected over the set {A, B} are immediately available by looking at the rela- 
tion V3{A,B)°''\ 

On the other hand, for the view vs{D,C) G views{Q4), the set vs{D, B)^^'^ 
contains two homomorphisms that do not belong to the set Q2^'^[{D, C}] (iden- 
tified by the two tuples marked with the symbol "►" in Figure [2]). Therefore, 
DB4 is not globally consistent w.r.t. Q4 and V4. < 

Legal Database. While no special requirement is assumed for the database 
relations of the available views in V, the relations associated with the query views 
cannot be arbitrary, otherwise we would lose any connection with the query Q 
to be solved using the view system V. In fact, these relations should reflect the 
intended initialization with the tuples contained in the relations associated with 
their corresponding query atoms (possibly filtered by eliminating tuples that 
are irrelevant w.r.t. query answers). 

We say that DB is a legal database instance (w.r.t. Q and V) if (i) C q°° 
holds, for each query view Wq € views{Q); and (ii) views{Q)"^ is view consistent. 
All other view instances may be arbitrary. Then, the following is immediate. 

Fact 3.6 For every legal database DB, 



Example 3.7 The database DB4 is legal w.r.t. Q4 and V4. Indeed, condition 
(i) is seen to hold by comparing the relations associated with the query views 
with the relation instance r""^*. Moreover, in Example 13.21 we have observed 
that views{QiY'^'^ is view consistent, i.e., condition (ii) holds as well. Then, 
because of the above fact, the answers of Q4 on DB4 are also given by the 
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Remark 3.8 Only legal databases over Q and V are meaningful for the purpose 
of this paper. Therefore, unless otherwise stated, we always implicitly assume 
hereafter this requirement for any database instance. In particular, whenever 
we say "for every database", we actually mean "for every legal database". Of 
course, whenever we define some database instance in proofs of our results, 
we deal with this requirement, and we explicitly prove that such a database is 
actually legal. 

Now that the setting is clarified, our next task is to provide sufficient and 
necessary conditions to evaluate queries via local consistency. For the sake of 
presentation and without loss of generality, we assume that the given query Q 
is connected and that vars(Q) = vars{y). Note that, under these assumptions, 
whenever is locally consistent, requiring that every relation associated with 
some view in V is non-empty is equivalent to requiring that there is at least one 
w in V with uf^ ^ 0. Indeed, the query views in the view system V makes "Hy 
connected, and thus any empty relation in the database would entail that all 
relations must be empty, at local consistency. 



3.2 Prom Tree Projections to Consistency... 

The fact that local consistency holds for V and DB is of course unrelated with 
the fact that global consistency holds for V and DB with respect to Q, in general. 
In this section, we show how the existence of tree projections of some parts of the 
query is a sufficient condition to get the implication lc(V,DB) ^ gc(V, DB,(5). 
Our analysis will consider arbitrary conjunctive queries, with any desired set O 
of output variables, and tree projections w.r.t. arbitrary view systems. 

We start by observing that, when arbitrary view systems are considered, it 
suddenly emerges that it does not make sense to talk about "the" core of a query, 
because different isomorphic cores may differently behave with respect to the 
available views. In fact, this phenomenon does not occur, e.g., for generalized 
hypertree decompositions (resp., tree decompositions) where all combinations 
of k atoms (resp., k + \ variables) are available as views (see Section 2]). 

Example 3.9 Consider again the query 

Qi ■■ r{A, B) A r{B, C) A r{A, C) A r{D, C) A r{D, B) A r{A, E) A r{F, E), 

which has been discussed in Example 13. H and which is graphically reported 
again in Figure [3l for the sake of presentation. The figure also reports the 
hypergraph Hv^ associated with the views in V4 (where, e.g., the hyperedges 
{A,B,C} and {A,F} are those corresponding to the views vi{A, B,C) and 
V2{A,F), and where (hyper)edges associated with the query views are still de- 
picted with their original orientation in Q4, as to make the correspondence 
clearer). Moreover, the figure reports the two queries 

Q5: r{A,B) Ar{B,C) Ar{A,C) 
Qe: r{D,B) Ar{B,C) Ar{D,C). 
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Figure 3: The (hypergraph of the) query Q4, the cores Q5 and Qe, the hy- 
pergraph 'HV4, and the cores of the queries Q4 A atom{{F, E}) (with its tree 
projection) and A atom{{A, F}), in Example 13.91 

Note that Q5 and are two (isomorphic) cores of Q4, but they have different 
structural properties. Indeed, {'HQ^^Hvi) admits a tree projection (note in the 
figure that the view over {A, B,C} "absorbs" the cycle), while {Hq^ , "Hvi) does 
not. < 



Computation Problem. Armed with the observation exemplified above, the 
relationship between consistency and structural properties will be next stated 
by considering the existence of a tree projection for some core of the query Q. 

In addition, to properly deal with arbitrary sets of output variables (which 
may be not included in any core of Q), we need to define an "output- aware" 
notion of covering by tree projections, where cores are forced to contain the 
desired output variables. 

Definition 3.10 For any set of variables O occurring in some atom w ^ V, 
define atom{0) to be a fresh atom (with a fresh relation symbol) over these 
variables, i.e., such that O — vars{atom{0)) . Then, we say that O is tp-covered 
in Q (w.r.t. V) if there exists some core Q' of Q A atom{0) .such that {T-Lqi , Hv) 
has a tree projection. □ 
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A first easy observation is that the tp-covered property holds for every set of 
variables occurring in every query atom, whenever (T-Lq, Tiv) has a tree projec- 
tion. 

Fact 3.11 Assume that (7{q,Hv) has a tree projection. Then, for every q € 
atoms(Q) and every O C vars{q), O is tp-covered in Q (w.r.t. V). 

Proof. Let q be any atom occurring in Q and take any O C vars{q). Let 
Q' be any core of Q A atom{0). Since Q' is a subquery of Q A atom{0) and 
O C vars{q), V.qi < T-Lq. Thus, T-Lqi < Hq < Ha < Hy, where Ha is any tree 
projection of {Hq,Hv), which exists by hypothesis. □ 

We next show that the above fact may be extended to those atoms occuring 
in some core of Q having a tree projection. 

Lemma 3.12 Let q e atoms{Q') be an atom occurring in some core Q' of Q 
for which ("Hq/jHv) has a tree projection. Then, VO C vars{q), O is tp-covered 
in Q (w.r.t. V ). 

Proof. Let O C vars{q), and consider the query Q A atom{0). We first 
claim that there is a homomorphism from Q A atom{0) to Q' A atom{0). In- 
deed, since Q' G cores(Q), it is also a retract of Q (see, e.g., |27J); that is, 
there is a homomorphism / from Q to Q' which is the identity on its range (i.e., 
f{X) = X, for every term X occurring in Q'). Moreover, O C vars{Q'), because 
q 6 atoms{Q'). It follows that / is also a homomorphism from Q A atom{0) 
to Q' A atom{0). In particular, note that / maps the atom atom{0) to itself. 
We thus conclude that Q' A atom{0) is also a core of Q A atom{0), because 
atom{0) is over a fresh relation symbol and hence must belong to any core, and 
dropping atoms from Q' would contradict the minimality of Q' as a core of Q. 
Finally, since vars{atom{0)) = O C vars{q) and q e atoms{Q'), the hypergraph 
associated with Q' A atom{0), say "H', is such that TL' < Hq' ■ Hence, any tree 
projection of T-Lqi w.r.t. T-Lvi which exists by hypothesis, is a tree projection of 
Ti' w.r.t. "Hy. That is, O is tp-covered in Q (w.r.t. V). □ 

Example 3.13 Consider again the setting of Example 13.91 The core con- 
tains the atoms r{A,B), r{B,C), and r{A,C), and we have noticed that Q5 
admits a tree projection. Therefore, we can apply Lemma [3. 121 to conclude that 
the sets of variables {A, B}, {B, C}, and {A, C} are tp-covered in Q4. 

Consider now the set of variables {F, E}, which does not occur in any core 
of the query, and the novel query A atom{{F, E}). This query has a unique 
core, which is again depicted in Figure[3] Notice that this core does not coincide 
with any of the two cores of the original query. Yet, it admits a tree projection, 
consisting of the hyperedges {F,E}, {A,E}, and {A,B,C}, as shown in the 
figure. Thus, {F,E} is tp-covered in Q4. 

On the other hand, the hypergraphs associated with the cores of Q4 A 
atom{{D,C}) and Q4 A atom{{D, B}) are precisely the same as the hyper- 
graph T-LQg associated with the core Qe, that is, the triangle with vertices D, B, 
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and C, having no tree-projection w.r.t. Hvi- Hence, {£>, C} and {D, B} are not 
tp- covered in Q^. 

Finally, for an example application of Definition 13.101 with arbitrary set of 
variables (i.e., not just contained in query atoms), consider the set {A,F}. 
Consider then the query Qj^ A atom{{A^F}) and note that its core does not 
have a tree projection. Thus, {A^F} is not tp-covered in Q4. < 

The notion of tp- covering plays a crucial role in establishing consistency 
properties. To help the intuition, this role is next exemplified. 

Example 3.14 Consider again the setting of Example 13.11 (and Example 13.91) 
and the database DB4 shown in FigureOover the relation symbol r (in Q4) and 
the symbols for the views in V4 — views{Qi) U {vi{A,B,C)tV2{A,F)}. Recall 
from Example 13.31 that V^^'^ is locally consistent. 

Observe that for the query view Vi{A,C), Vi{A,C)^'^* consists of the two 
tuples/homomorphisms (ai,ci) and {02,02). That is, this query view provides 
exactly the two homomorphisms in (54^"' [{A, C}], i.e., the answers of Qi pro- 
jected over the variables {A and C) of the view Wr(A,c)- Note that the same 
property holds for the views over the set of variables {A,C}, {A,B}, {B,C}, 
{F,E}, {A,E}, and {A,B,C}. Interestingly, each one of this set is tp-covered 
in Q4 (see also Example 13. 13p . 

On the other hand, each one of the sets vj{D,B)°^^, vs{D,C)^^*, and 
V2{A,F)°^'^ contains two homomorphisms that do not correspond to any an- 
swer of the query (suitably projected over the variables of interest), which are 
those identified by the tuples marked with the symbol "►" in Figure [2l In fact, 
we observe that, in this case, {D,C}, {D,B}, and {A,F} are not tp-covered in 
Q4. < 

In the above example, the fact that homomorphisms that are not correct 
answers are associated with views whose variables are not tp-covered is not by 
chance. Indeed, the intuition is now that to guarantee global consistency by 
just enforcing local consistency, all the variables contained in query atoms must 
be tp-covered. 

Next, we establish a lemma that actually proves a slightly more general 
result dealing with any set of output variables covered by some view. For a 
set of variables O, let covers{0) denote the set of all views w G V such that 
O C vars{w). 

Lemma 3.15 Assume that is locally consistent. For any set of variables 
O that is tp-covered in Q, w°^[0] C (5°^[0] holds, for every w G covers{0). 
Moreover, if is view consistent w.r.t. Q, for some w € covers{0), then we 
actually get all the right homomorphisms for all of them, i.e., uP^[0\ — Q^^[0\ 
holds, for every w G covers{0). 

Proof. Let Qe = Q f\ atom{0). Assume that O is tp-covered in Q, that is, 
there exists Q' G cores(Qe) for which {T-Lq',Hv) has a tree projection. Since 
Q' is a core, it is also a retract of Qe', that is, there is a homomorphism / from 
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Qe to Q' such that f{X) = X, for every term X occurring in Q' . Clearly, / 
is a homomorphism from Q to Q', too. Then, for every (legal) database DB, 
Q'°^ C Q^'^[vars[Q')]. Moreover, consider the query WQ' where we have query 
views in place of the original query atoms, that is, 

WQ' = atom{0) A /\ Wg. 

atoms {Q')\{atom(0)} 

Because DB is a legal database, we immediately get WQ'°^ = Q'™ C Q°^[vars{Q')\ 
and, hence, WQ'^[X] C Q^[X] holds as well, for any X C vars{Q'). 

Now consider any (legal) database DB such that is locally consistent, 
and any tree projection Ha of {'Hq' ,'Hv)- Assume w.l.o.g. that nodes{T-ia) = 
nodes{'HQ') (otherwise, just drop possible additional variables, and you still get 
a tree projection of {TLq' ,7iv)). Observe that O C hp, for some hyperedge ho of 
'Hq. Indeed, atom{0) G atoms{Q'), since atom{0) is defined on a fresh relation 
symbol, and thus this atom must occur in every core of Qe, i.e., O G edges{'HQ'). 
Let us associate with T-la the following query: 

Q^:^WQ'A /\ atom{h). 

For any fresh atom atom{h) G atoms{Qa) (including atom{0)) , let atom{h)°^ = 
where u G V is any view satisfying h C vars{v), chosen according to 
some fixed (arbitrary) criterium. Such a view always exists because Ha is a tree 
projection of {'Hq','Hv). 

Note that Q°^ C WQ'°^, because WQ' is a subquery of Qa- By construction 
Qa is a simple acyclic query, and atoms {Qa)°^ is locally consistent because all 
these relations are projections of views in the locally consistent set V°^. Thus, 
by the results in [7], Qa is globally consistent and we get, for the atom atom{0), 
atom{Oy = QT\0] C WQ'^^'IO] C Q°='[Cl]. Moreover, since is locally 
consistent, this property must hold for every with w G V and O C vars(w). 
That is, u;°^[0] C g°=^[0] holds, for every w G covers{0). 

Assume now that the output variables O are covered by some view consis- 
tent atom, i.e., O C vars{w) for some w G V such that Q^^[vars{w)] C w°'^ 
and thus Q°°[0] C w°^[0]. Since is locally consistent, it follows that 
w°^[0] = atom{0)°'' and thus Q°='[0] C atom{0)°'' . Combined with the 
above relationship, we get the desired equality Q°^[0] — atom{0)°^ . Again, 
since V""^ is locally consistent, this property must hold for every w°^, with w G V 
and O C vars{w). That is, (5°'^[0] ~ w°^[0], for every w G covers{0). □ 

Since query views are always view consistent (over legal databases), we im- 
mediately get the following sufficient condition for the global consistency, which 
clearly also holds for restricted tree projections corresponding to decomposition 
methods. 

Theorem 3.16 Assume that, for every q G atoms{Q) , vars{q) is tp-covered in 
Q (w.r.t. V). Then, for every database DB, lc(V, DB) entails gc(V, DB,Q). 
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Figure 4: Mapping an undirected grid into an edge. 



Having a tree projection of the full query is therefore not necessary for 
getting global consistency through local consistency. For instance, an unsus- 
pectedly easy class of queries consists of the grid queries of the form GQn = 
/\x YeE i^i-^^ ^) ^ ^0^^ ^))' where En is the edge set of an n x n grid. Indeed, 
while such grids are well known obstructions to the existence of tree decompo- 
sitions, any of their edges is a core (and, thus, trivially acyclic) — see Figure IH 
Therefore, even the smallest possible set of views V = views (GQn) is sufficient 
to obtain global consistency by enforcing local consistency. 

As we shall prove in Section 13. 3[ Theorem 13.161 defines the most general 
possible condition to guarantee global consistency, which is what we need to 
answer the query by exploiting local consistency if the output variables are 
included in some query atom. 

Decision Problem. The situation is rather different if we just look for the 
most general sufficient conditions to solve the decision problem Q^^ 7^ 0- In 
this case, it is sufficient the existence of a tree projection of any structure for 
which there is an endomorphism of the query. Of course, any such a subquery 
Q' is homomorphically equivalent to Q, denoted by Q' «hom Q in the following. 
In fact, the concept of tp-covering is immaterial here, given that we are not in- 
terested in output variables (i.e., O — 9). Thus, as a special case of our analysis 
on the computation problem, we get the following result, which generalizes to 
tree projections (where cores may behave differently) a similar sufficient con- 
dition known for the special cases of tree decompositions 1151 . and generalized 
hypertree decompositions |12) . 

Theorem 3.17 Assume there is a subquery Q' ~hom Q for which ('Hq',Hv) has 
a tree projection. Then, for every database DB, lc(V,DB) entails 7^ 0. 

Proof. Let Ha be a tree projection of {Hq'^Hv), for some Q' sShom Q- Then, 
it is also a tree projection of {Hq" jUv), for any Q" G cores(Q') C cores((5), 
because Hq" < Hq'. From Lemma [3.12[ for any (query atom) q £ atoms{Q"), 
vars{q) is tp-covered in Q and thus, from Lemma 13.151 Q°^[vars{q)] = w°^. 
Then, whenever lc(V, DB) holds, ^ and hence ^0. □ 

Note that the above condition is more liberal than what we need for having 
the global consistency. In the next section we prove that it is in fact also a 
necessary condition as far as the decision problem is concerned. 

Moreover, we point out that, from an application perspective, either results 
above may be useful only if we have some guarantee (or some efficient way to 
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check) that the required conditions are met. Otherwise, as it happens for the 
decision problems in the special cases of (generalized) (hyper)tree decomposi- 
tions |12[ 134] ■ we are in a promise setting where, in general, we are not able to 
actually compute any full (and thus polynomial-time checkable) query answer 
(or disprove the "promise"). In particular, it has been observed in a slightly 
different setting by [3^ (see, also, [Mllin]) that, rather surprisingly, the global 
consistency property (and hence having a full reducer) is not sufHcient to actu- 
ally compute a full query answer (unless P = NP). Intuitively this is due to the 
fact that, as soon as we fix some tuple in a relation in order to extend it to a full 
solution, we are changing the set of available query endomorphisms and thus we 
may loose the property of some variables to be tp-covered. As a consequence, 
subsequent propagations are not guaranteed to maintain the global consistency. 

3.3 ...and Back to Tree Projections 

The question of whether the cases in which local consistency implies global 
consistency precisely coincide with the cases in which there is a tree projection 
of the query with respect to a set of views was a long-standing open problem 
in the literature [IHl SI]- We next answer this question, both in the setting 
considered in those papers (where all relation symbols in the query are distinct), 
with the answer being positive there, and in the unrestricted setting where the 
answer is instead negative. In fact, we precisely characterize the relationships 
between local and global consistency and tree projections in the general setting 
too, by showing that tree projections are still necessary, but not necessarily 
involving the query as a whole. 

Decision Problem. We start with the problem of checking whether the given 
query is not empty. Theorem 13.191 below provides the counterpart of Theo- 
rem [3?T2l The proof requires some preparation. 

Let DB be a database over the vocabulary VS. For the following results, 
we assume that each relation symbol r G T>S of arity p is associated with a 
set of p (distinct) attributes that identify the p positions available in r. In this 
context, r is also called relational schema, and VS is called database schema. 
An inclusion dependency is an expression of the form ri[S] C r2[S'], where ri 
and r2 are two relational schemas in VS and S is a set of attributes that ri and 
r2 have in common. A database DB over T>S satisfies this inclusion dependency 
if, for each tuple ti S rf^, there is a tuple t2 S rf^ with ti[S] = t2[S] (where [•] 
is here the classical projection relational operator applied to a set of attributes). 
Moreover, if DB satisfies each inclusion dependency in a given set /, then we 
simply say that DB satisfies /. 

Define A{DS) as the set of canonical atoms associated with the schema 2? 5', 
that is, the set containing, for each relation r of VS, the atom r(u) having as 
its variables the attributes of r. A conjunctive query Q is said to be a canonical 
query for VS whenever it consists of atoms from A{DS), i.e., atoms{Q) C 
AiDS) holds. 

We are now ready to state a fundamental lemma on union of conjunctive 
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queries, i.e., on queries of the form Q — Qi V • • • V Q„, where Qi is a con- 
junctive query Vi S {!,.., n}. We are interested in unions of Boolean queries, 
so that Q 9^ if (and only if) Qf^ ^ for some query Qi in the union Q. 
The ingredients in the lemma are a recent result on the finite controllability of 
unions of conjunctive queries in the framework of databases under the open- 
world assumption 43 , and a connection between tree projections and the chase 
procedure firstly observed in |44) . 

Lemma 3.18 Let VS be a database schema equipped with a set I of inclusion 
dependencies. Let Q be a union of canonical queries for DS such that,y (finite) 
DB ^ over DS, DB satisfies L ^ Q 7^ 0- Then, there exists a conjunctive 
query Q' in the union Q such that {T-Lq' jHAiDS)) has a tree projection. 

Proof. Unlike all other proofs in the paper, we next deal both with finite 
and infinite databases, and thus we always point out whether a database is (or 
may be) infinite. All databases are implicitly assumed to be over the database 
schema VS. From the hypothesis, the following property holds for Q: 

Pi V finite DB ^ 0, DB satisfies / ^ if ^ 0. 

Let us start by taking an arbitrary atom r^{Xi, . . . , Xm) in Q, and let 
DBq — {rw (cxi , . • . , cx„ ) } , where cxi , ■ ■ ■ , cx^ are fresh (distinct) constants. 
Trivially, Pi entails the following property: 

P2 V finite DB D DBq, DB satisfies / q""" ^ 0. 

Recall that the possibly infinite database chase(/, DBq) is built from DBq 
by adding iteratively new tuples to satisfy inclusion dependencies in /, until 
no dependency is violated by the current database (see for instance [T]). In the 
following, it is convenient to represent chase(/, DBq) as a tree T of tuples rooted 
at rw{cxi, ■ ■ ■ , cx„), and where edges are built as follows. Let DB^ denote the 
set of all the tuples in chase(/,DBo) associated with nodes in the first i levels 
of T (the root is level 0). Let r(t) be a node of T at level i. For each inclusion 
dependency r[A] C r'[A] € I such that there is no tuple r'(t') £ DB; that 
matches with r(t) over the attributes in A, a node r'(t") is added as a child of 
r(t), where r'(t") is a fresh tuple that matches with r(t) over the attributes in 
A and contains fresh constants of the form cy, for any (other) attribute Y ^ A 
in the schema of relation r' . 

A well known property of chase(/, DBq) is that it maps via homomorphism 
to any other (possibly infinite) database that satisfies / and includes the non- 
empty database DBq. Therefore, whenever q'^^^^^'--' '°^<' ^ 0^ ^-j^g same holds for 
every database that satisfies / and includes DBq. 

We now use the finite controllability result by Rosati [43^ which, applied to 
our Q, I, and DBq, reads as follows: the answer of Q is not empty on every 
(possibly infinite) database that satisfies / and includes DBq if, and only if, the 
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answer of Q is not empty on every finite database that satisfies / and includes 
DBo (by Theorem 2 in [43') H Therefore, P2 impUes the following property: 

P3 ^ 0. 

Because Q is a union of conjunctive queries, this means that there is a query 
Q' in Q having a homomorphism h : vars{Q') i— > Uc from Q' to chase(J, DBq), 
where Uc is the universe of cliase(/, DBq). In particular, from a well known 
result of Johnson and Klug |36] . we may assume, w.l.o.g., that h maps Q' to a 
finite subtree Tf of T. 

Observe now that /i is a bijection. Indeed, DBq contains the one tuple 
Vy^icxn ■ ■ ■ ,cx^) with a distinct constant for each attribute of and, by def- 
inition of chase(/, DBq), any constant cy can never be used for an attribute 
different from Y. In fact, either cy belongs to the starting tuple and it is then 
propagated to fresh tuples by the chase generating-rule, or it is a fresh constant 
belonging to a tuple created to satisfy some inclusion dependency (which does 
not involve attribute Y). Moreover, recall that attributes in VS are in fact 
variables in Q' , because the latter is a canonical query. Then, since ft, is a ho- 
momorphism, for each variable (attribute) Y , h{Y) has the form cy for some 
constant cy occurring in tuples of chase(J, DBq). 

We now define a labeling A, associating each node of Tf with a set of variables 
in vars{Q'). Let F = {h{X) \ X e vars[Q')}. For each vertex p = r(cy^ cy„) 
in Tf, define X{p) as the set {ft^^(cy ) | cy S V}. Let pi and p2 be two vertices 
of Tf such that X G A(pi) n \{p2) is a variable in vars{Q'). Consider the chase 
constant h{X), which occurs in pi and p2 in Tf. Let px be the top- most vertex 
of Tf where h{X) occurs. Because of the chase generating-rule, each node in 
the path from px to pi (resp., P2) contains the constant h(X). Thus, since Tf 
is a tree, h{X) occurs in the path between pi and P2- Therefore, X occurs in 
A-labeling of each vertex in this path, too. 

Now consider the hypergraph Ha containing exactly one hyperedge A(p), 
for each vertex p of Tf, and note that Ha is acyclic, because we have actu- 
ally just shown that the A-labeling on Tf defines a join tree of Ha- More- 
over, since /i is a homomorphism from Q' to chase(/, DBo), for each atom 
q G atoms{Q') there exists a vertex p = h{q) in Tf for which A(p) = vars{q); 
thus, Hq' < Ha- Finally, by construction, each hyperedge A(p) in Ha is built 
from a tuple p = r{cy-^ , cy^) of chase(/, DBq), hence a tuple of (the relation 
of) some canonical atom a,, in A{DS)- Moreover, we observed that, for each 
variable Yi G A(p), /i^^(cy.) —Yi£ vars{ar)- Then, A(p) C vars{ar), and hence 
T^a ^ Ha{ds)- All in all, we have shown that, for the query Q' in Q, there is a 
tree projection of T^Q' w.r.t. Hj^^bs)- ^ 



^In particular, it is shown that this is equivalent to the condition Q ^^"^ ,DBo,m) _^ 
where m is a finite natural number that depends on the given instance (including the query) 
and f chase(7, DBq, m) is the so-called finite chase, that is, a non-empty finite database playing 
the same role of the (possibly) infinite chase, as far as the evaluation of Q is concerned. 
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Theorem 3.19 Assume there is no tree projection oj {Hq' ,'Hv) , for each core 
Q' £ cores((5). Then, local consistency does not entail global consistency. 
In particular, there exists a (legal) database DB such that lc(V,DB) holds but 
= 0. 

Proof. Recall that we assumed w.l.o.g. that no constants or repeated variables 
occur in the views in V, while the query Q has no restriction. Moreover, each 
view w £ V is over a distinct relation symbol (let us denote it by r^, in the 
following), so that there is a one-to-one correspondence between relations and 
views. Therefore, V identifies a database schema VS consisting of such a relation 
r^, for each w G V, whose list of attributes is precisely the list of variables of 
the view w. Thus, V is by construction the set of canonical atoms associated 
with 

Let us equip VS with the following set / of inclusion dependencies: For each 
pair of views w,w' G V such that S — vars{w) fl vars{w') ^ 0, / contains the 
two inclusion dependencies r^lS] C ru,/[S'] and ru,/[5'] C r„[5']. 

Observe that, by the construction of /, for each database DB over VS, 
lc(V,DB) holds if, and only if, DB satisfies / and DB 7^ (recall that Q 
is connected and vars{Q) ~ varsiV), hence is also connected because 
views{Q) C V). 

For any set of atoms D, let us denote hy /\D the Boolean conjunctive query 
defined as the conjunction of all atoms in D. Let Q ~ VQ'ecores(Q)(A vie'ws{Q')) 
be the union of (Boolean) canonical queries for VS obtained by considering 
the cores of Q, and assume that there is no tree projection of (Hq' ,Hv), and 
hence of {(H.mews{Q'),T~Lv), for each core Q' e cores((3). Then, by Lemma [3.18( 
there exists a (finite) database DB/ ^ that satisfies / and such that MQ' £ 
cores((5), (/\ views{Q'))°^f = 0. In particular, because this database satisfies 
/, lc(V,DB/) holds. 

From DB/, let us now build a new legal database instance DB/ over the 
vocabulary including both views and query atoms. This database is obtained 
by slightly changing the relations in DB / in order to keep the information about 
the (active) domains of the variables, and by adding the relation instances for 
the query atoms in Q. Recall that more query atoms may share the same 
database relation. 

Let q G atoms{Q) be any query atom defined over a relation symbol r of 
arity p, and let r^^^Xi, . . . ,Xn) G views (Q) be the query view Wq associated 
with q. Recall that both constants and repeated variables may occur in q, so 
that p > n. Let r^^(ci, c„) be any tuple in DB/. Then, DB/ contains 
a tuple rw^{{Xi,ci), {Xn,Cn)) in the relation instance for the query view 
Wq € V. Moreover, for the relation r, DB/ contains a tuple r(vi, . . . , Vp) defined 
as follows. For each i G {1, . . . , p}: if some constant term Ui occurs in q at 

*We remark that the assumption that no constant or repeated variables occur in views is 
just for the sake of presentation. If this assumption does not hold, it is sufficient to define 
instead a database schema TfS' obtained from V by removing such useless occurrences, to 
use its canonical atoms, and to manage, after the described construction, the correspondence 
between relations in 755' and views in V. 
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position i, then Vi — uf, if some variable Xj occurs in q at position i, then Vi = 
{Xj,Cj). Note that this value may occur in r{vi, . . . , Vp) at different positions, if 
Xj occurs more than once in q. Moreover, if the relation r is shared by different 
query atoms, such a tuple . . . ,Vp) will be available to every atom defined 
over r, besides q. Finally, for any (non-query view) w over a relation and 
any tuple r^ici, c„) G DB/, DB^ contains a tuple ryj{{Xi, ci), . . . , {Xn, Cn))- 
No further tuples belong to DBj. 

As lc(V,DB/) holds, we immediately have that lc(V,DBj) holds, too. We 
now claim that Q'°^f — 0, for each subquery Q' £ cores((5), which entails 
Q°^f = 0. Indeed, assume for the sake of contradiction that there is a core Q' 
such that Q'°^f ^ 0, and let h' be a homomorphism from Q' to DBj. Define tti 
and 7r2 to be the projections mapping a binary tuple {u, c) to its first element u 
and to its second element c, respectively; moreover, for a plain (term) element 
u, 7ri(w) = 7r2(M) = w. In particular, for any tuple r(ui, . . . ,Wp) in DBj, where 
any value t^i is either of the form (u^, Ci) or of the form Ui with being a con- 
stant term, we have 7ri(r(?;i, . . . , Vp)) = r(7ri(ui), 7ri(up)) — r{ui, Up). By 
construction of the tuples in DB^, the composition h' o tti is a homomorphism 
from Q' to Q (if we obtain a certain tuple of terms after applying tti, there 
must exist some query atom with that tuple of terms). But, since Q' is a core, 
we have that the image Q" = {h' o 7ri)(Q') is also a core in cores(Q), and 
thus h' o TTi is actually an isomorphism. In particular, h" = {{h' o tti)"^ o h') 
is now such that h"(ui) ~ {ui,Ci). In particular, whenever Ui = X, for some 
variable X e vars{Q"), h"{X) — {X,Ci). It follows that h" is a homomor- 
phism from Q" to DBj. Then, we immediately get that h" o tt2 is a ho- 
momorphism from /\vie'ws{Q") to DB^. Indeed, by construction, for each 
atom q G atoms{Q") defined on a relation r, if r(wi,...,Up) G DBj, then 
7r2(ru,^ (mi, . . . , Wn)) G DB/ (with . . . , Wn) being the tuple derived from 
(tti, . . . ,itp) by inverting the above construction, i.e., by eliminating constants 
and repeated variables). However, the existence of this homomorphism contra- 
dicts the fact that (/\ views{Q"))°°f = holds by the construction of DB/. 

Finally, note that DBj is legal. Indeed, for each query view Wq, by construc- 
tion Wq^^ C q°^f , and Wq is trivially view consistent because Q°^f =0. □ 

A consequence of the above result and Theorem 13.171 is the precise charac- 
terization of the power of local consistency, as far as the decision problem is 
concerned. This characterization was so far only known for the special case of 
treewidth and for structures of fixed arity [6], where, however, all the cores enjoy 
the same structural properties (and hence such results are defined in terms of 
"the core" of the query). 

Corollary 3.20 The following are equivalent: 

(1) For every database DB, lc(V,DB) entails 7^ 0. 

(2) There is a subquery Q' «hom Q for which ('Hq', Hv) has a tree projection. 

(3) There is a core Q" of Q for which (?^q",7^v) has a tree projection. 
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Figure 5: The (hypergraph of the) query Qj, the uou-query views in Vj, and 
their respective tuples in the database DBy of Example 13.221 

Proof. From Theorem 13.171 we know that (2) implies (1). Theorem 13.191 
entails that (1) implies (3). Finally, (3) implies (2) because any core of Q is 
homomorphically equivalent to Q. □ 

Eventually, we can specialize Corollary 13. 201 to the setting of simple queries 
(considered in many seminal papers about tree projections, as [29]), where every 
relation symbol occurs at most once in the query and thus the whole query is 
its (unique) core. 

Corollary 3.21 Let Q be a simple query. Then, the following are equivalent: 

(1) For every database DB, lc(V, DB) entails Q"^ ^ 0. 

(2) {'Hq,'H\;) has a tree projection. 
Example 3.22 Consider the query 

Qt : ri{A,B) Ar2iB,C) ArsiCD) Ari{D,E) Ar^iAE), 

the set of views V7 = {vi{A,B,E),v2{B,C,E), V3{A, C,E), V4{A,C, D),V5{A, D, E)}, 
and the database instance DB7 depicted in Figure O It is easy to check that 
(V7 U atoms{Qj))°^'' is local consistent but Q"^^ = 0. Indeed, it can be checked 
that {T-Lq.j .T-Lvj) does not have a tree projection. < 

Computation Problem. We next complete the picture and give the conditions 
that precisely characterize those cases where answers of the query over output 
variables covered by some view may be immediately obtained by enforcing local 
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consistency. Again, we start with the problem where we are interested in query 
answers over some arbitrary set of output variables. In this case, requiring that 
just some view covering O is trustable is sufficient to allow all such answers to 
be immediately obtained. 

Theorem 3.23 Let O be any set of variables occurring in some view in V. 
Then, the following are equivalent: 

(1) For each database DB such that lc(V, DB) holds, w°='[0] C Q°='[0], for 
every w S covers{0). If there is a view consistent HP^ withw S covers{0), 
then w°^[0] = Q°^[0], for every w G coversiO). 

(2) The set of variables O is tp-covered in Q (w.r.t. V). 

Proof. First observe that (2) entails (1), by Lemma 13.151 Then, in order 
show that (1) entails (2), it suffices to consider the case where there exists 
Q" G cores((5) for which {Hq" ,Hv) has a tree projection. Otherwise, we 
immediately get the contradiction that all views are incorrect for some database, 
from Theorem l3.19l Consider the new query Qe — Q ^ atom{0), and assume by 
contradiction that O is not tp-covered in Q. That is, for every Q' S cores(Qe), 
(Hq'jHv) has no tree projections. We show that there exists a database DB 
such that lc(V,DB) but Q°^[0] C a°^[0], for every a G covers{0), where 
coversiO) ^ 0, by hypothesis. 

Let Ve = V U {atom{0)}. Since no core of Qe has tree projections, by 
Theorem 13.191 it follows that there is a (non-empty legal) database DB' such 
that lc(Ve,DB'), but Q^^ = 0. Now define a new database DB such that, for 
every a G Ve, = U [a], and where the relations in DB' over which 
the original query atoms are defined are just copied into DB. By construction, 
lc(Ve,DB) holds, because lc(Ve,DB') holds and the tuples possibly added to 
any view are projections of mappings over the full set of variables, as they are 
obtained from the total homomorphisms in . Moreover, note that only 
views are modified, as no tuple is added to the relations over which the original 
atoms in the query are defined. Thus, Q^^ = holds. 

Observe that DB is a legal database instance w.r.t. Q. Indeed, the relations 
for query views are still subsets of the relations of the original query atoms (as 
in DB'). Moreover, by construction, they include all tuples that are part of 
some query answer, and thus all query views are view consistent w.r.t. Q. 

Recall now that wc arc considering the case where some cores of Q have tree 
projections, and lc(Ve,DB) and hence lc(V,DB) hold. From Theorem I3.17[ it 
follows that Q°=^ = Q°=^' ^ 0. However, (Q A atom{0)Y^' = 0. It follows that 
all homomorphisms that are answers of Q over DB' does not satisfy atom{0), 
that is, Q°=''[0] n atom(0)°^' = 0, and recall that atom(0)°='' ^ 0, because 
lc(Ve,DB') holds. 

Therefore, we get the proper inclusion (3°^[0] C atom{0)°^. Indeed, atom{0) 
is not empty and all its tuples, which do not belong to [O] = Q°^[0] ^ 0, 
are kept in atom{0)°^ . Finally, since and hence are locally consistent, 
this also entails atom{Oy ^ a^^'p] and thus Q^^'p] C a[0]°^, for each view 
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a € cover s{0). 



□ 



The following corollary is the specialization to the case where we are inter- 
ested in output variables covered by some query atom. 

Corollary 3.24 The following are equivalent: 

(1) For every database DB, lc(V, DB) entails gc(V, DB,Q). 

(2) For each q G atoms{Q), vars{q) is tp-covered in Q (w.r.t. V). 

Proof. Since query views covers the variables of query atoms and are always 
view consistent w.r.t. Q in any legal database, the statement immediately fol- 
lows from Theorem 13.231 and Theorem 13.161 □ 

The specialization of Corollary 13. 241 to the setting where every relation sym- 
bol occurs at most once in the query provides the answer to the question posed 
by i29j. 

Corollary 3.25 Let Q be a simple query. Then, the following are equivalent: 

(1) For every database DB, lc(V, DB) entails gc(V, DB,Q). 

(2) {'Hqt'Hv) has a tree projection. 

Finally, we point out that Theorem l3.23l mav be equivalently stated in terms 
of any arbitrary (legal) database DB, by considering its reduct red(V, DB) ob- 
tained enforcing local consistency. 

Corollary 3.26 Let O be any set of variables occurring in some view in V. 
Then, the following are equivalent: 

(1) For each database DB, [O] C Q°^[0], for every w G covers(0) , where 
DB' = red(V,DB). If there is a view consistent with Ou G covers{0), 
then [O] = Q°^[0], for every w G coversiO). 

(2) The set of variables O is tp-covered in Q (w.r.t. V). 

Proof. (1) ^ (2) follows from the corresponding implication (1) (2) in 
Theorem 13.231 which entails that, whenever O is not tp-covered in Q (w.r.t. V), 
there exists a locally-consistent legal database DB and a view w G covers{0) 
such that w°^[0] D Q°^[0]. In fact, because it is locally consistent, DB = 
red(V,DB) holds. 

(2) =^> (1) follows from the corresponding implication in Theorem 13.231 and 
from the fact that the only tuples occurring in DB and deleted in its reduct 
DB' do not participate in any query answer. Therefore DB' is a legal locally 
consistent database. □ 
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4 Application to Structural Decomposition Meth- 
ods 



In this section, we specialize our results about consistency properties and tree 
projections to the purely structural decomposition methods described in the 
literature (both in the database and in the constraint satisfaction area) , because 
all of them can be recast in terms of tree projections. In fact, each of them can 
be seen as a method to define suitable set of views to be exploited for solving 
the given query answering instance. Here, views represent subproblems over 
subsets of variables, whose solutions can be computed efficiently. 

We also provide further results that hold on such special cases only, such as 
the positive answer to the question in jl2] about /c-local consistency and gen- 
eralized hypertree decomposition, and the precise relationship between acyclic 
queries and local consistency, solved in [7] for the simple queries. 

4.1 Decomposition Methods and Views 

We start by formalizing the concept of structural decomposition method in our 
framework. Let the pair (Q, DB) be any query answering problem instance. 
For any subset of variables S C vars(Q), let ((3|5,DB|5) be the subproblem of 
((3,DB) induced by S defined as follows: for each atom a S atoms{Q) with 
vars{a) Ci S ^ fb, Q^g contains an atom a' over a fresh relation symbol r^' 
having vars{a) n as its set of variables, and whose database relation is such 
that a'°^is = a°^[S']. No further atom belongs to Q\s, and no further relation 
belongs to DB|5. Intuitively, {Q\s, DB|5) is the most constrained subproblem of 
(Q,DB) where only variables from S occur, because all atoms involving (even 
partially) those variables are considered. In particular, for each subquery Q' 
whose set of variables is S, we have g'°^ D D Q^^'iS]. 

Definition 4.1 A structural decomposition method DM is a pair of polynomial- 
time computable functions w-DM and d-DM that, given a conjunctive query Q 
and a database DB', compute, respectively, a view system V — u-DM((5) and a 
database DB" = (i-DM((5, DB') over the vocabulary ofV such that^ 

- the database DB = DB' U DB' over the (disjoint) vocabularies of Q and 
V is legal; 

- tor eacti w d V , w D That IS, any view w contains at 
least the solutions of the subproblem of (Q, DB) induced by its variables 
(subproblem completeness j. □ 

Note that the above completeness property is a local property, and clearly 
entails the (global) view consistency property for V^"^. 

^For the sake of presentation, we do not consider FPT decomposition methods (where 
functions v-DK and d-DM are computable in fixcd-paramctcr polynomial-time), but our results 
can be extended easily to them. 
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Figure 6: Structures discussed in Example 14.21 



Every known purely- structural decomposition method DM, where views (sub- 
problems) are only determined by the query and do not depend on the database 
instance, can be recast this way, with decompositions of Q according to DM being 
tree projections of {TLqjHv)- Indeed, all such methods are in fact subproblem- 
based, because any view relation is instantiated with the solutions Q'°^ of 
some subquery Q' (depending on the specific method), which is not necessar- 
ily an induced subproblem. Some exemplifications of the above definition are 
discussed below. 

Tree Decompositions. For any fixed natural number k, the tree decompo- 
sition method [T71 [H] {twk) is characterized by the functions v-twk and d-twk 
that, given a query Q and a database DB, build the view system v-twk{Q) and 
the database d-twkiQj^B). In particular, for each subset 5* of at most A; -I- 1 
variables, there is a view ws over the variables in S (i.e., vars{ws) = S) whose 
tuples are the solutions of the subproblem induced by S (or, more liberally, the 
cartesian product of the set of constants that variables in S may take). An 
illustration of the view set characterizing treewidth is reported below. 

Example 4.2 Consider the query 

Qs : ri(A, B) A r^iB, C) A r^iA, C) A ri{C, D), 

whose associated hypergraph is depicted on the left of FigurelHl Consider the ap- 
plication on Qg of the tree decomposition method. The set of views v-tw 2{Q8) 
defined by this method for fc = 2 is graphically illustrated on the right of Fig- 
ure [51 In fact, the figure shows how Qg can be covered via an acyclic hypergraph 
that consists of two hyperedges covered by two available views, the largest of 
which includes three variables. In fact, the treewidth of Qg is 2. < 

Generalized Hypertree Decompositions. For any fixed natural number k, 
the generalized hypertree decomposition method 24 (short: hwk) is character- 
ized by the functions v-hwk and d-hwk that, given a query Q and a database 
DB, build the view system v-hwk{Q) and the database d-hwk{Q, DB) where, for 
each subquery Q' of Q such that \atoms{Q')\ < fc, there is a view wqi over all 
variables in Q' (i.e., vars{wQi) — vars{Q')) and whose tuples are the answers of 
Q' . Note that hwk satisfies the subproblem completeness property too, because 
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Q' is in general more liberal than the subproblem induced by vars{Q'). Indeed, 
the latter also deals with further atoms where such variables occur (possibly 
together with other variables not occurring in Q'). 

Acyclicity. Recall that a hypergraph is acyclic if, and only if, it has (general- 
ized) hypertree width 1 |24| . Therefore, the acyclicity method (short: acyc) is 
just the specialization of the above method for the case of A; = 1. In particular, 
v-acyc{Q) is precisely the set of query views views{Q). 

Fractional Hypertree Decompositions. For any fixed natural number k, 
consider the subqueries characterizing the fractional hypertree decomposition 
method [5S] : they are defined precisely as in the case of the generalized hypertree 
decomposition method, except that a view wq' is built more generally if, for 
a subquery Q', its hypergraph T-Lqi has fractional edge-cover number [35] at 
most k. Unfortunately, these views may be exponentially-many even if k is 
a fixed constant, and in fact there is no known polynomial time algorithm to 
decide whether the fractional hypertree-width of a hypergraph is at most k. 
However, we may still define the required pair of polynomial-time functions 
v-fwk and d-fwk for this decomposition method, by actually exploiting for their 
computation the subproblems identified by Marx in his 0(fc'^) polynomial-time 
approximation of the fractional hypertree-width |40| . Moreover, following the 
same kind of arguments used for the generalized hypertree decompositions, it 
can be seen that the subproblem completeness property is satisfied by such a 
pair of functions, too. 

Submodular Width. For the sake of completeness, note that the only known 
decomposition technique that does not fit the above framework is the one based 
on the submodular width '41]. This method is in fact not "purely" structural. 
Indeed, according to this technique, a number of view schemas are computed 
in fixed-parameter polynomial time (hence not polynomial-time, in general) by 
looking at the database DB of the given instance, too (while w-DM functions 
depend on the query only). Moreover, their associated database relations are 
not necessarily subproblem-complete. 

4.2 Decomposition Methods and Consistency Properties 

By using Theorem l3.231 it is possible to characterize the power of local-consistency 
based algorithms in structural decomposition methods, as stated in the follow- 
ing result 1^ In fact, this result is not a trivial consequence of Corollarv I3.26[ 
as it is evident by contrasting their statements: here, the database DB" for the 
views is computed from a database over the vocabulary of the query Q only, 
according to the specific function d-DM characterizing the method DM, while it is 
an arbitrary (legal) database in Corollarv 13.261 

Theorem 4.3 Let DM be a decomposition method, let Q be a conjunctive query, 
and let V — v-Dn{Q). The following are equivalent: 

^For completeness, we observe that a similar result has been proved in I32| in the setting 
of constraint satisfaction problems, by precisely exploiting Theorem 13.231 



36 



(1) For every database DB (over the vocabulary of Q) and for every view 
w G V with O C vars{w), [O] (9°^[0], where DB' = red(V,DB") 
and DB" d-DM(Q,DB). 

(2) A set of variables O C vars{Q) is tp-covered in V. 

Proof. The fact that (2) ^ (1) immediately follows from Corollary 13.261 We 
have to show that (1) => (2) holds as well. Observe that if O is not tp-covered 
in Q w.r.t. V = u-DM((3), by Theorem 13.231 we conclude the existence of a 
locally consistent (legal) database DB = DBq UDBy, with DBq being over the 
vocabulary of Q and with DBy being over the vocabulary of V, respectively, and 
the existence of a view w G V such that iS°^[0] D Q°''[0], with O C vars{'w). 
Let DB" = c?-DM(Q, DBq) be the database comprising the relations for the views 
in V built according to method DM, and let DB' = red(V,DB") be its reduct, 
obtained by enforcing local consistency. 

We first claim that the database DBy is included in DB", formally, for any 
w G V, wP^ = w°^^ C . Consider such a view w G V, having vari- 

ables S — vars{w), and the subproblem ((5|s,DB|5) of (Q,DB) induced by 
S. By construction, only variables from S occur in Q^s thus, for each 
atom a G atoms{Q\s)^ vars{a) C S. It trivially follows that the pair of hyper- 
graphs (HQ|g,Hv) has a tree projection. Let V+ = V U views{Q\s) be the set 
of views obtained by adding to V the query views associated with the induced 
subproblem, and let DB+ be the database obtained by adding to DB the re- 
lations in DB|5, as well as their copies on the relation symbols of the query 
views views{Q\s)- Clearly, DB"*" is a legal database (w.r.t. and V"*") and V+ 
is a view system for Q'^ . Moreover, since we just added new views to V, the 
pair {7iQ^g,'H\!+) has a tree projection, too. In particular, from Fact 13.11] S is 
tp-covered in Q^g w.r.t. V^. Moreover, observe that the database relations for 
the new views in are just projections of the relations of the original query 
views, which already belong to V. Therefore, their presence has no impact on 
the local consistency property, and lc(V"'", DB"*") holds. By Theorem l3.23[ for ev- 
ery O' C S, we get w°=^[0'] = w°''*[0'] C Q°P^[0']. That is, w""" contains only 
solutions of the subproblem induced by w. On the other hand, the subproblem 
completeness condition entails that 3 Q|g • Hence the claim follows, as 

for any chosen w G V with variables S — vars{w), w"^ C Q^s'^ . 

To conclude, recall that lc(V,DBv) holds, so that DBy is a locally consis- 
tent database included in DB", and thus all its tuples will survive after en- 
forcing local consistency on DB", that is, all of them belongs to the reduct 
DB' = red(V,DB"). Therefore, C w°^' , \/w G V. In particular, for the view 
w and the set of variables O C vars{w), we get Q°°[0] C ^"^[O] C w°^'[0], 
hence we get wrong solutions (over O) using the view iju with the database 
DB'. □ 

For the decision problem (O = 0), we get the following special case. 
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Corollary 4.4 Let DM be a decomposition method, let Q be a conjunctive query, 
and let V — v-Dn(Q). The following are equivalent: 

(1) For every database DB (over the vocabulary ofQ), red(V, (i-DM(Q, DB)) 7^ 
entails Q°=^ ^ 0. 

(2) There is a subquery Q' «hom Q for which (Hq',Hv) has a tree projection. 

(3) There is a core Q" of Q for which (?^q",7^v) has a tree projection. 

If we consider decision problem instances (O = 0) and the treewidth method 
(V — v-twk{Q)), from Corollary 14. 4[ we (re-)obtain the nice characterization 
of |E| about the relationship between fc-local consistency and the treewidth of 
the core of (50 

If we consider the generalized hypertree- width (V — v-hwk{Q)), we next pro- 
vide the answer to the corresponding open question for the unbounded arity case. 
Recall that in [12_ it was shown that if the core of Q has generalized hypertree- 
width at most fc, then the procedure enforcing fc-union (of constraints/atoms) 
consistency is always correct, i.e., the reduct of the database is not empty if, and 
only if, the query has some answer. We next show that this sufficient condition 
is necessary, too. 

In fact, observe that the following result does not follow immediately from 
Corollary 14.41 Indeed, any core Q' of Q may be much smaller than Q, and 
thus the set of views v-hwk{Q') available using Q' is in general (possibly much) 
smaller than the set of views v-hwk{Q) available when the whole query Q is 
considered. For an extreme example, think of the undirected grid (see again 
Figure 11]) , where any edge is a core: in this case, the set of available views 
for computing a hypertree decomposition of the core is precisely this one edge 
(for any fc), while considering the whole query, the available views comprise all 
unions of fc edges. 

This subtle issue is irrelevant for the treewidth method, because such a tech- 
nique considers all possible combinations of at most fc variables, and clearly only 
those variables occurring in the core are useful for computing any of its tree de- 
compositions. Instead, when generalized hypertree decomposition is considered, 
in principle using some particular combination of variables occurring in some 
atom outside any core Q' may be necessary for getting a width-fc generalized 
hypertree decomposition of Q' . 

Theorem 4.5 Let Q be a conjunctive query, and let V = v-hwk{Q) . The fol- 
lowing are equivalent: 

(1) For every database DJi (over the vocabulary of Q ) , red{V, d-hwk{Q,D^)) 7^ 
entails Q°=^ 7^ 0. 

(2) There is a subquery Q' «hom Q having generalized hypertree-width at most 
k. 

^As already observed, for treewidth and (generalized) hypertree-width isomorphic sub- 
structures behave in the same way, so that all cores have equivalent properties. Thus, for 
these methods one may simply say "the core" Q' (instead of some core). 
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(3) There is a core Q' of Q having generalized hypertree- width at most k. 

Proof. It suffices to show that (3) is equivalent to (3') below. Then, the 
theorem follows from Corollary 14.41 

(3') There is a core Q' of Q for which {Hq', Hv) has a tree projection, with 
V^v-hwk{Q). 

Let V' — v-hwk{Q')- Note that (3) is equivalent to say that {'Hq' jT-Lv) has 
a tree projection, which entails (3'), because Q' is a subquery of Q and thus 
Hv < Hv- 

It remains to show that (3') => (3). Assume by contradiction that this 
is not the case, hence there is a core Q' of Q for which {Hqi ,T-L\;) has a tree 
projection Hq, but every core of Q has generalized hypertree width greater than 
k. In particular, this must hold for Q', too. It follows that there exists some 
hyperedge h that belongs to Ha and thus is covered by some hyperedge of 'Hvi 
but it is not covered by any hyperedge of T-Lv, where V = v-hwk{Q')- That 
is, there is no view w in V' such that h C vars{w). Recall that, by definition 
of function v-hwk, views in V (resp., V") contain the union of variables from all 
possible sets of at most k atoms occurring in Q (resp., Q'). It follows that there is 
some atom a G atoms{Q) with X — vars{a)rih ^ which does not belong to Q' 
and whose role in w cannot be played by any other atom in Q' . Formally, there 
is no atom a' G atoms{Q') such that X' C vars{a'), where X' — X D vars{Q'). 
In fact, note that X' are the only possible crucial variables: further variables of 
w not occurring in Q' are never necessary in any tree projection of T-Lqi (w.r.t. 
any hypergraph), as it is known and easy to see that, if a tree projection exists, 
there always exists one that uses only nodes from Hqi [50] . 

However, Q' is a core of Q, and thus it is a retract, which means that there 
must exist a homomorphism / from Q to Q' where f{X) — X, for each term 
X occurring in Q' . Therefore, the atom a should be mapped to some atom 
a' £ atoms{Q') that contains all variables f{X) for each X 6 vars{a). In par- 
ticular, this entails that all variables in X' occur in a', because / is the identity 
mapping over them. Contradiction. □ 

For the special case of A; = 1 , the above result provides the precise relation- 
ship between local consistency and acyclic queries, extending the classical result 
given in [7] for simple queries (in fact, for acyclic schemas). Recall that, for 
the acyclic method, the set of views v-acyc{Q) is just the set of query views 
views{Q), and their database relations in rf-aci/c( Q, DB) are just the copies of 
their corresponding query atoms. 

Theorem 4.6 For any conjunctive query Q, the following are equivalent: 

(1) For every database DJi (over the vocabulary of Q ) , re(i{v-acyc{Q), d-acyc(Q,T)Ti)) 
entails g°=^ ^ 0. 

(2) There is an acyclic subquery Q' ~hom Q- 

(3) Q has an acyclic core. 
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5 Larger Islands of Tractability 



In this section, we investigate a tractable variant of the notion of tree projections 
that allows us to identify new islands of tractability for query answering, con- 
straint satisfaction problems, and further problems that are easy on tree-like 
structures. Indeed we argue that, in practical database applications, "blind" 
local-consistency enforcing procedures are hardly used, because the number of 
semijoin operations to be performed depends on the database size and may be 
very high. On the other hand, if one is able to compute a tree projection, then 
the views to be processed will be only those involved in the tree projection, and 
the number of semijoin operations to be performed will be at most the number 
of these views (hence, independent of the database). 

The new notion is based on the game characterization of tree projections 
proposed in 30 . To formalize our results, we need to introduce some additional 
definitions and notations, which will be intensively used in the following. 

Assume that a hypergraph % is given. Let V, W, and {X, Y} be sets of 
nodes. Then, X is said [y]-adjacent (in H) to Y if there exists a hyperedge 
h G edges{H) such that {X,Y} Q {h - V). A [F]-path from X to F is a 
sequence X — Xq, . . . ,Xg — Y of nodes such that Xi is [y]-adjacent to Xi+i, 
for each i € [0. ..£-!]. We say that X [y]-touches F if AT is [0]-adjacent to 
Z E nodes{7i), and there is a [y]-path from Z to Y; similarly, X [V^]-touches 
the set ly if AT [y]-touches some node Y S W. We say that W is [y]-connected 
if VA", Y E W there is a [y]-path from A" to 1^. A [T^] -component (of Ti) is a 
maximal [T^]-connected non-empty set of nodes W C (nodes{H) ~ V). For any 
[T^]-component C, let edges{C) — {h E edges{H) \ h D C 0}, and for a set 
of hyperedges H C edges (H), let nodes (H) denote the set of nodes occurring 
in H, that is nodes{H) = IJ^^^ h. For any component C of T-L, we denote by 
Fr(C, the frontier of C (in T-i), i.e., the set nodes{edges{C))^ Moreover, 
d{C,'H) denote the border of C (in n), i.e., the set Fr(C,'H) \ C. Note that 
Ci C C2 entails Fr(Ci,7^) C Fr(C2,'H). 

In the following sections, given any pair of hypergraphs (^1,^2) and a set 
of nodes C C Hi, we write for short Fr(C) and dC to denote Fr(C, Hi) and 
d{C,7ii), respectively. 

5.1 Game-Theoretic Characterization 

The Robber and Captain game is played on a pair of hypergraphs (Hi,H2) 
by a Robber and a Captain controlling some squads of cops, in charge of the 
surveillance of a number of strategic targets. The Robber stands on a node and 
can run at great speed along the edges of Hi- However, (s)he is not permitted 
to run trough a node that is controlled by a cop. Each move of the Captain 
involves one squad of cops, which is encoded as a hyperedge h S edges{T-L2)- 
The Captain may ask some cops in the squad h to run in action, as long as they 

*The choice of the term "frontier" to name the union of a component with its outer border 
is due to the role that this notion plays in the hypergraph game described in the subsequent 
section. 
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occupy nodes that are currently reachable by the Robber, thereby blocking an 
escape path for the Robber. Thus, "second-lines" cops cannot be activated by 
the Captain. Note that the Robber is fast and may see cops that are entering in 
action. Therefore, while cops move, the Robber may run trough those positions 
that are left by cops or not yet occupied. The goal of the Captain is to place a 
cop on the node occupied by the Robber, while the Robber tries to avoid her/his 
capture. 

Definition 5.1 Let Hi and be two hypergraphs. The Robber and Cap- 
tain game on (?^i,?^2) is formalized as follows. A position for the Captain 
is a pair {h, M) where h is a hyperedge of 'H2 o,nd M C h. A configura- 
tion is a triple {h,M,C), where {h,M) is a position for the Captain, and C 
is the [M]-component where the Robber stands^ The initial configuration is 
{%,%,nodes{ni)). 

A strategy a is a function that encodes the moves of the Captain. Its domain 
includes the initial configuration. For each configuration Vp = (hp, Mp,C'p) in 
the domain of a, (y{vp) — {hr,Mr), with Mr C /i,,nFr(Cp), is the novel position 
for the Captain. After this move, the Robber can select any [vp , Af^y-option, 
i.e., any [Mr] -component C'r such that CpUC'r is [Mp D Mr] -connected. If there 
is no [vp, Mr] -option, then [hr, Mr,%) is said a capture configuration induced 
by a. The move of the Captain is monotone if, for each [vp, Mr] -option C'r, 
Cr C Cp. The domain of a includes the configuration {hr,Mr,Cr), for each 
[vp, Mr]-option C'r- No other configuration is in the domain of a. The strategy 
a is monotone if it encodes only monotone moves over the configurations in its 
domain. 

A strategy a can be represented as a directed graph G{<7) — (N,A), called 
strategy graph, as follows. The set N of nodes is the set of all configura- 
tions in the domain of a plus all capture configurations induced by a. If Vp = 
{hp, Mp,Cp) is a configuration and cr(wp) = [hr,Mr), then A contains an arc 
from Vp to {hr, Mr, C'r) for each [vp, Mr]-option Cr, and to {hr, Mr,^) if there 
is no [vp, Mr] -option. We say that a is a winning strategy (for the Captain) if 
G{a) is acyclic. Otherwise, i.e., if G{a) contains a cycle, then the Robber can 
avoid her/his capture forever. □ 

Example 5.2 Consider the two hypergraphs Hi and H2 reported in Figure [7l 
together with the strategy graph G{a). The graph encodes a winning strategy 
a for the Captain. From the initial configuration (0, 0, nodes{Hi)), the Captain 
activates all the cops in the hyperedge {A, G,D,E, G}, so that the Robber has 
two available options, i.e., {B} and {F}. In the former (resp., latter) case, 
the Captain activates all the cops in the hyperedge {B,C} (resp., {E,F}), so 
that the Robber has necessarily to occupy the node A (resp., G). Finally, the 
Captain activates the cops in {A,B} (resp., {F,G}) and captures the Robber. 

^It is easy to sec that in such games, being the robber arbitrarily fast, what matters is not 
the precise node where the robber stands, but just the [Af ]-component where (s)he is free to 
move. 



41 



Figure 7: The hypergraphs Tii and 7^2, plus the graph G(cr) in Example 15.21 
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Figure 8: A tree projection Ha for the pair in Example l5.2[ plus the graph G(a). 

Note that the strategy a is non-monotone, because the Robber is allowed to 
return on A and G, after that these nodes have been previously occupied by the 
Captain in the first move. < 

In the above example, the hyperedge {A, C, D, E, G} of "absorbs" the 
cycle in Tii, so that it is easily seen that there is a tree projection Tia of Hi 
w.r.t. H2 (see Figure The fact that on this pair the Captain has a winning 
strategy is not by chance. 

Theorem 5.3 ( [30] ) There is a tree projection of Hi w.r.t. H2 if, and and 
only if, there is a winning strategy in the Captain and Robber game played on 
(Hi, 7^2)- 

Recall that the winning strategy in Example 1 5. 2 1 is not monotone. However, 
an important property of this game is that there is no incentive for the Captain 
to play a strategy that is not monotone. 

Theorem 5.4 (cf. |30| ) In the Captain and Robber game played on the pair 
("Hi, 7^2), o- winning strategy exists if and only if a monotone winning strategy 
exists. 

Moreover, from any monotone winning strategy, a tree projection of Hi 
w.r.t. H2 can be computed in polynomial time. 

Example 5.5 Consider again the setting of Example 15.21 and the strategy 
graph G{<t) shown in Figure IH Note that the strategy tr is monotone, and 
in fact the moves of the Captain one-to-one correspond with the hyperedges in 
the tree projection Ha- < 
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The crucial properties to establish Theorem 15.41 are next recalled, as they 
will be useful in our subsequent analysis too. Let cr be a strategy, and let 
Vp = {hp, Mp,Cp) and Vr — {h^, Mr,Cr) be two configurations in its domain 
such that (j{vp) = {hr,Mr) and Cr is a [vp, M^J-option. Let a{vr) = {hs,Ms) 
and define ED((Mr, Cr), M^) = M^nFr(Cr)\Ms (which is equivalent to dCr\Ms 
because Cr is an [M,.] -component) as the escape-door of the Robber in Vr when 
attacked with Mg. From [30], a move is monotone if, and only if, such an 
escape door is empty; in particular, a{vr) is non-monotone if (and only if) 

EB{{Mr,Cr),Ms)^9. 

Let M; ^ Mr\ ED((Mr, Cr), Ms), let C'r bc the [M;]-component with Cr U 
Y:'D{{Mr,Cr),Ms) C C'r, which exists since ED{{Mr, Cr), Ms) C Fr(C^) and 
C Mr, and let v'r — {hr,M^, C'r). Finally, consider the following strategy cr': 

a'ih, M, c) = ( f , ^) = ('^^ (1) 

^ ' ' ^ [ a{h, M, C) otherwise. ^ ' 

For such a state of the game, a number of technical properties have been 
proved in 30J. We summarize them in the following lemma. 

Lemma 5.6 ([30]) The following properties hold: 

(1) ed((m;,c;),m,) = 0. 

(2) For each [vp, Mr\-option C, either C Q C'^. or C is a [vp, M'^]- option. 

(3) For each [vp, M'^]- option C ^ C'^, C is a [vp, Mr]- option. 

(4) A set C is a [vr, AIs]- option if, and only if, it is a [v'^, Ms]- option. 

(5) If a is a winning strategy, then a' is a winning strategy too. 



5.2 Greedy Strategies 

Since winning strategies correspond to tree projections, there is no efficient 
algorithm for their computation. Indeed, just recall that deciding the existence 
of a tree projection is not feasible in polynomial time, unless P = NP [^. Our 
goal is then to focus on certain "greedy" strategies that are easy to compute. 
Intuitively, in greedy strategies it is required that all cops available at the current 
squad hp and reachable by the Robber enter in action. If all of them are in 
action, then a new squad hr is selected, again requiring that all the active cops, 
i.e., those in the frontier, enter in action. 

Definition 5.7 On the Captain and Robber game played on {711,1-12), a strategy 
a is greedy if, for any configuration Vp — {hp, Mp, Cp) in the domain of a, the 
next position a{vp) = {hr, Mr) is such that Mr = hrCi Fr(Cp), where hr = hp if 
hpCiCp ^ 0, and hr is any squad in edges{%2) if hpC\Cp — %. □ 
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Given such a greedy way to select cops at each step, observe that the former 
case {hp CiCp ^ 9) may only occur if the Robber is able to come back to some 
position previously controlled by the Captain. Greedy winning strategies are 
indeed non-monotone in general, and for some pair of hypergraphs it is possible 
that there is no monotone winning greedy strategy, although monotone winning 
strategies (non-greedy) exist. 

Example 5.8 Consider again the hypergraphs Hi and shown in Figure [71 
and recall that the strategy graph of a monotone winning strategy a is depicted 
in FigurelSl However, there is no monotone greedy strategy in this case. Indeed, 
if at the beginning of the game the Captain asks the squad {A, C, D, E, G} to 
enter in action and the Robber goes on B, then in the next move the Robber is 
forced to lose the control on A in order to move on {C, B} and eventually win 
via {B, A} — see again Figure [71 On the other hand, if the attack of the Captain 
starts on either side, say on the left branch, the Captain has then to attack 
the component that includes the triangle and the other branch. At this point, 
the only available greedy choice is use the big squad and hence to employ cops 
{C, D, E, G}. However, as in the previous case, G will be later (necessarily) left 
free to the Robber, in order to win the game. < 

We now show that, differently from arbitrary strategies, the existence of 
greedy winning strategies can be decided in polynomial time. To establish the re- 
sult, a useful technical property is that greedy strategies can only involve a poly- 
nomial number of configurations. Let us denote by MaxGreedyStrat('Hi, 'H2) 
the maximum domain cardinality over any greedy strategy in the Robber and 
Captain game on a pair (Hi,H2). 

Lemma 5.9 Let {Tii, 1-12) be a pair of hypergraphs. Then, MaxGreedyStrat{'Hi, 1-12) 

is at most \edges{'H2)\ x \nodes{'Hi)\{\edges{'H2)\ x \nodes('Hi)\ + 1) + 1. 

Proof. Let cr be a greedy strategy, and let Vp — {hp, Mp, Gp) be a configuration 
in its domain. Note that the only configuration where hp = Mp = is the 
starting configuration (0, 0, nodes{'Hi)), which is taken into account by the final 
in the statement. Therefore, we next assume Mp ^ 0. 
Consider the case where hp n Gp = 0. In this case, a new squad hr G 
edges{'H2) is chosen by the Captain according to cr. Since Gp is an [Mp] -component 
and thus dGp C Mp C hp, we get that this case occurs only if Gp is actually 
an [/ip]-component, too. Such a component is uniquely identified by any pair of 
the form {hp,Xp) such that Xp e nodes{'Hi) is a representative of the compo- 
nent (e.g., the node in Gp having the smallest position according to any fixed 
ordering over the nodes). It follows that the new set of cops M^ — h^f] Fr(Cj) 
is uniquely determined by hr and Gp and thus may be identified through a 
triple {hr, hp, Xp). Thus, the maximum number of such sets M,. of cops is 
16^565(7^2)1^ X \nodes{Hi)\. Moreover, the possible configurations {hr, Mr,Gr) 
following {hp, Mp,Gp) in the game where the Captain plays according to a are 
identified by quadruples of the form {hr,hp, Xp, Xr), where hr is used both 
to identify itself and to determine the set Mr together with hp and Xp, and 
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Boolean function GreedyWinningStrategy(/ip, Mp, Cp, i)\ 

I* {hp, Mp,Cp) is an extended configuration over (^1,^2), 
i > is a natural number */ 

1) if i > MaxGreedyStrat('Hi, then return FALSE; 

2) if hpCiCp ^ 0, then let K = hp; 

else guess a hyperedge hr £ edges {T-L2)\ 

3) let Mr = ft,. nFr(Cp); 

4) for each [(hp, Mp,Cp), Mr]-option Cr do 

if not GREEDYWlNNlNGSTRATEGY(/ir, Mr, Cr, i + 1), then return 
False; 

5) return True; 

Figure 9: GreedyWinningStrategy. 

where Xr is a representative of the [Mi-] -component. In fact, if there is no 
[wp, Mp] -option, then Xr is a distinguished element not in nodes (Hi) (or some 
element in Alp occupied by some cop) meaning that the only configuration fol- 
lowing {hp, Mp,Cp) is {hr,Mr,%) where the Robber is captured. Overall, the 
maximum number of such configurations is \edges{T-L2)\'^ x \nodes{T-ii)\'^ . 

Finally, consider the case where hpCiCp ^ In this case, Mr = /ipnFr(Cp). 
Since Cp is an [Mp]-component, dCp C Mp C hp. It follows that the new nodes 
from Fr(Cp) to be included in Mr belong to Cp, that is, we may also write 
Mr = Mp U {hp O Cp). Note that no configuration of the game following this 
one can be of this type. Indeed, every [Mr]-component Cr where the Robber 
may go from Cp will be a subset of Cp (because dCp C Mp C Mr Q hp), and 
will have intersections with hp. As a further consequence, such a Cr must be an 
[/ip]-component. By contradiction, if there is some node Xp & Cr ^ Cp that is 
[Mr]-connected to some Xr in hp \ Mr, then Xp is also [Mp]-connected to Xr- 
However, this is impossible because Xp is also in Cp and hence Xr would be 
in Cp, too, and hence in hp n Cp and in Mr, by construction. Therefore, the 
possible configurations {hp,Mr,Cr) following {hp, Mp,Cp) in the game where 
the Captain plays according to a are identified by pairs of the form {hp,Xp), 
where Xp G nodes{'Hi) is the representative of the [/ip] -component Cr (and 
where Mr is computed from them). As above, if there is no [wp, Mp]-option, 
then Xp is a distinguished element witnessing that the configuration is a cap- 
ture configuration of the form {hp, Mr,0). Overall, the maximum number of 
such configurations is \edges{'H2)\ x |77.odes(?^i)|. □ 

To see that the existence of a winning greedy strategy is decidable in poly- 
nomial time, consider the GreedyWinningStrategy algorithm illustrated in 
Figure [HI which receives as input a configuration {hp, Mp,Cp) for the Robber 
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and Captain game, plus a "level" i. Note that this algorithm is a high-level 
specification of an alternating Turing machine, say M-g [23 • After the first 
step, where we check that the number of recursive calls has not exceeded the 
number of all distinct configurations, the algorithm suddenly evidences its non- 
deterministic nature. Indeed, it guesses a hyperedge hr corresponding to the 
next move of the Captain (existential step of TWg). Eventually, it returns True 
if, and only if, the recursive calls GREEDYWiNNiNGSTRATEGY(/ir, Mr, Cr,i + 1) 
with Mr = hrnFi{Cp) succeed on each [{hp, Mp, Cp), Mr]-option (universal step 
of Xg). 

Theorem 5.10 Deciding the existence of a greedy winning strategy in the Rob- 
ber and Captain game is feasible in polynomial time. 

Proof. Let {Hi, 712) be a pair of hypergraphs, and consider the execution of 
the Boolean function GreedyWinningStrategy on input the starting config- 
uration (0, 0, nodes{'Hi) , 0). Due to its non-deterministic nature, it is easily seen 
that, by getting rid of step (1), it returns True if, and only if, the Captain has a 
greedy winning strategy in the game played on {T-Li , 'H2) (which we assume to be 
"visible" by the function at a every call, to avoid a longer signature). Moreover, 
we claim that the check performed at step (1) cannot lead to a wrong False 
output. Indeed, just observe that the number of recursive calls is bounded by 
the number of all distinct configurations, which is MaxGreedyStrat(Hi, H2) at 
most, by Lemma 15.91 Therefore, if the recursion level i exceeds this threshold, 
then we can safely answer False. 

Let us now focus on the running time. We have already observed that 
GreedyWinningStrategy may be implemented on an alternating Turing 
machine Mg, whose existential steps correspond to the guess statements at 
step 2, while universal steps are used for checking that the conditions at step 4 
are satisfied by all the relevant components. In addition, by indexing the various 
data structures and by referring each component via one point contained in it 
(selected through any fixed criterium) , the machine can be implemented to use 
logarithmic many bits on its worktape. For instance, recall from the proof of 
Lemma that every configuration is identified by at most four elements of the 
form {hp,hr, Xp, Xr) with hp,hr S edges{T-i2) and Xp,Xr € nodes{%i). There- 
fore, any configuration may be encoded by (at most) four indexes whose maxi- 
mum size is logmax{|ed(?es('H2)|, 1710(165(7^1)1}. Moreover, the check at step (1) 
ensures that the length of each branch of the computation tree of M.g is finite, 
and actually bounded by a polynomial in the size of the input. For the sake of 
completeness, observe that all subtasks in the function, such as computing con- 
nected components and the like, are easily implementable in nondeterministic 
logspace, so that such tasks just correspond to further (polynomially-bounded) 
branches of the computation tree of Mg- Thus, GreedyWinningStrategy 
may be implemented in a log-space alternating Turing machine, which imme- 
diately entails the result, because Alternating Logspace is equal to Polynomial 
Time [H]- □ 
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It is well known that an alternating Turing machine Mq can be simulated by 
a standard machine in polynomial time. First, compute the polynomially-many 
possible instant descriptions (IDs) of the machine, and build a graph represent- 
ing the possible connections between any pair of IDs, according to its transition 
relation. Then, evaluate this graph along some topological ordering as follows. 
Mark all IDs without outcoming arcs associated with final accepting states; then 
mark all IDs associated with existential states having a marked successor, or 
associated with universal states, and whose successors are all marked. Then, 
the machine A4c accepts its input if, and only if, the starting ID is marked. 
Moreover, the subgraph induced by the marked nodes encodes its accepting 
computations. 

Moreover, from such a marked graph it is straightforward to compute the 
strategy graph of a greedy winning strategy, because IDs associated with (chil- 
dren of) existential states encode the possible choices of the Captain^ Just 
visit the graph starting from the initial configuration, but for each ID associated 
with an existential state, select one child to be visited arbitrarily (all choices 
are marked and hence accepting). 

Corollary 5.11 The strategy graph of a greedy winning strategy (if any) in the 
Robber and Captain game is computable in polynomial time. 

5.3 Greedy Tree Projections and Larger Islands of Tractabil- 
ity 

From the previous sections (see Theorem 15.41 and Example 15. 8|) . we know that 
monotone winning strategies for the Captain in the game over (7^1,7^2) are 
associated with tree projections of TLi w.r.t. 7^2, and that in some cases it is 
possible that there is no monotone winning greedy strategy, although monotone 
winning strategies (non-greedy) exist. In this section, we show that from any 
(possibly non-monotone) greedy winning strategy a tree projection can be still 
computed in polynomial time. The key fact here is that any non-monotone 
greedy strategy can be converted into a monotone one, though not a greedy one 
in general. 

To show the result, it is useful to consider a special form of strategies that 
we call nice (for they remind the notion of nice tree decompositions of graphs) , 
where at every configuration the Captain first removes those cops that are no 
longer in the frontier. 

Formally, cr is a nice strategy if cr{hp, Mp, Cp) = {hp, dCp), whenever dCp C 
Mp. Because such inactive cops play no role in the Robber and Captain game, 
a winning nice strategy exists if (and only if) there exists a winning strategy, 
and the same holds for greedy strategies. Just note that restricting the cops 

^"For the sake of completeness note that, by using these ideas, one might also provide a 
direct dynamic programming algorithm to compute a strategy graph by using a bipartite 
graph representing all possible configurations and positions of the Robber and Captain game. 
However, we find the non-deterministic function GreedyWinningStrategy more elegant and 
easy to present. 
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Figure 10: The strategy and component graphs for the nice strategy cr„ in 
Example Eini 



to the border of Cp is a legal choice in greedy strategies (it corresponds to 
the selection of the same squad = hp before attacking the robber in the 
component Cp with some further squad). Clearly enough, such a nice strategy 
can be computed in polynomial time from any given strategy. Also, if desired, 
the above polynomial time algorithm for computing a greedy strategy may be 
easily adapted to compute directly a winning nice greedy strategy (if any) . 



Example 5.12 Consider again the setting discussed in Example 15.21 and illus- 
trated in Figure [T] Note that the strategy a is not nice. Indeed, Figure [10] 
reports the strategy graph associated with a strategy fT„ that is nice and that is 
obtained from a by just explicitly adding the configurations where the Captain 
has to remove the cops that are no longer in the frontier. < 

The reason for introducing these nice strategies is that they admit a more 
compact representation. First, given any configuration {hp, Mp, Cp) and a Cap- 
tain's choice Mr, the [{hp, Mp, Cp), Mr]-options for the Robber are actually de- 
termined by Cp and Mr only, because dCp is computable from Cp. Therefore, 
we use hereafter the simplified notation [Cp, Mr]-option to refer to this set of 
[Mr]-components. Moreover, in place of the strategy graph, we can use a com- 
ponent graph, defined as follows. 

Definition 5.13 Let {'Hi,H2) be a pair of hypergraphs. Let C = {N,A) be a 
directed graph whose nodes are pairs of the form {hp, Cp), where hp G edges{'H2), 
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and Cp is either the emptyset or a [dCp]- component of Hi such that dCp C hp. 
Then, we say that G is a component graph if it meets the following conditions: 

(1) There is a root node (0, nodes CHi)) G N that is the only node without 
incoming arcs. 

(2) Each node {hp,Cp) G N, with Cp ^ 0, has outgoing arcs to m > 
nodes (hr, Ci), . . . , {hr, Cm) such that, if Mr is the set Ujli '^^j ^ i^p \ 
Uj^i C'j); it holds that Mr C hr and the [Cp, Mr]-options are the compo- 
nents Ci, Cm- 

(3) Each node {hp,Cp) G N has an outgoing arc to {hr,$) if Cp C hr. □ 

Note that every nice strategy a is encoded by the component graph Gc{o) = 
{N, A) defined as follows. There is a node {hp, Cp) (resp., {hp, 0)) in N if there is 
a configuration {hp,dCp,Cp) in the domain of a (resp., a capture configuration 
{hp, dCp, 0) induced by a). There is an arc in A from a node {hp, Cp) to a node 
{hr, Cr) if there is an arc from {hp, Mp, Cp) to {hr, Mr, Cr) in the strategy graph 
G{a). No more nodes and arcs occur in TV and A, respectively. For instance, 
the graph depicted on the bottom part of Figure [7] is the component graph 
associated with the nice strategy cr„ of Example 15.121 

Conversely, any component graph G encodes a nice strategy ctg, via the 
following procedure. Associate the root (0, nodes{'Hi)) with the initial configu- 
ration {%,%, nodes {%i)). Inductively, assume that a node {hp,Cp) is associated 
with a configuration {hp, Mp, Cp), and that {hr, Ci), {hr. Cm) are the labels of 
the nodes having an incoming arc from {hp, Mp, Cp). Let Mr — Uj^i ^ i^p \ 
U^iQ); "^ith Mr C hr. Then, define ac{hp, Mp,Cp) = {hr,Mr), and define 
(JG{hr, Mr, Cj) = {hr, dCj), with j G {1, to}, in the case where dCj C Mr. 

Theorem 5.14 A tree projection of Tii w.r.t. can be computed in polyno- 
mial time if the Captain has a greedy winning strategy on (7^1,^2). 

Proof. By Theorem lS.lOl we can decide in polynomial time whether a winning 
greedy strategy for the Captain in the game played on (7^1,7^2) exists or not. 
In the negative case, we are done. Otherwise, compute in polynomial time a 
winning nice greedy strategy a (or turn a given strategy into a nice one), and 
compute its component graph Gc{cr). Make a copy C — {N' , A') of Gc(cr), and 
note that G' is a directed acyclic graph, because it encodes a winning strategy. 

Let — vi,. . . ,v\]s[i\ be the topologically ordered sequence of the nodes 
of G", where the nodes without outgoing arcs, called leaves, are in the first 
positions, and the node without incoming arcs, its root, is at the last position. 
Note that leaves correspond to capture configurations for the robber, while 
the root W|jv'| = {%, nodes {Hi)) is associated with the starting configuration 
(0, 0, nodes{'Hi)) of the game. Moreover, if {v, v') G A' , the node v is said to be 
a parent of v' , while v' is said to be a child of v. Then, modify the graph G", 
by navigating the sequence 7^ using an index j. 
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Starting with j — 1, while j < \N'\, consider the current node Vj in the 
sequence, associated with a configuration {hj,Mj,Cj) (initially, the first leaf) 
in the domain of ac- If every child of Vj is labeled by some {h",C") with 
C" C Cj, then let index j := j + 1 and continue the "while" loop, or stop 
and output the current graph G" if Vj is the root. Otherwise, let Vs be a 
child of Vj labeled by {hs,Cs) G N' such that Cg % Cj, and associated with 
the configuration {hs,Ms,Cs). That is, aG'{hj,Mj,Cj) = {hs,Ms) is a non- 
monotone move. Then, take any parent Vp of Vj, and let {hp, Mp,Cp) the 
configuration associated with Vp (whose label is thus {hp,Cp)). Modify the 
graph so that a^hp, Mp,Cp) = {hj,M^), where Mj = Mj \ F,D {vj, Ms). In 
particular, let Cj be the [Mj]-component that properly includes Cj, and for 
which thus Cp U is [Mp n Afj] -connected. Then, the modified component 
graph will also encode the choice acihj, M'j, Cj) — {hj,dCj) if dC'j C Mj, and 
ac'ihj, dC'j, Cj) = {hs, Ms). The transformation of the graph is as follows: 

(i) Add a node wj labeled by {hj , Cj ) to N' and to the sequence in the 
position before vj, and add to A' an arc from Vj to each child of Vj, i.e., 
to nodes labeled by {hs,C"), for each [Cj, Ms] -option C". 

(ii) Remove from A' all outgoing arcs of Vp to nodes whose labels do not 
contain [Cp, A/j]-options (in particular, the arc towards Vj is removed). 

(iii) Add to A' an arc from Vp to wj. 

(iv) Remove from N' any node different from the root which is left without 
incoming arcs, and continue the "while" loop considering again node Vj, 

or the next available node in if Vj has been removed by N' . 

Example 5.15 The application of the above procedure to the nice strategy 
CT„ discussed in Example 15.121 is illustrated in Figure [TTJ Note that two non- 
monotone moves are removed in total. Note that, at the end of the transfor- 
mation, we get a component graph encoding precisely the monotone strategy a, 
whose strategy graph has been illustrated in Figure [S) < 

First observe that every iteration of the loop at step 1 above, precisely 
implements on the graph C the transformation (of the non-monotone strategy 
encoded by C) described by Expression ([T|), and whose properties are described 
by Lemma 15.61 In more detail, with these properties in mind, by executing 
steps (i)-(iii) we replace the Captain's choice {hj,Mj) at {hp, Mp,Cp) by the 
new choice {hj,M'^), and we get the following situation: (a) Because of the 
new choice Mj, only one new [Cp,M'^]- option is available to the robber, that 
is, the [M'j]- component Cj properly including the [Mj]- component Cj. As a 
consequence, at step (i) the one node v'j corresponding to this component is 
added to N'. (b) The [Cj, Ms]-options are the same as the [Cj, Ms]-options, 
so that the outgoing arcs of v'j will be the same as the node Vj. That is, we 
keep the same winning strategy as before, as the Robber's options after the 
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Figure 11: Illustration of the algorithm in the proof of Theorem 15. 141 



Captain's choice Ms are the same as before (and hence the Captain knows how 
to successfully attack them), (c) The set of [Cp, Mj]-options, with the exception 
of the new C^-, are a subset of the [Cp, Mj\-options. In fact, some components 
may collapse after the new choice of the Captain. Then, at step (iv) , we remove 
the nodes associated with [Cp, Mj\-options that are now left without incoming 
arcs. For instance, it is possible that we delete Vj if Vp was its only parent, or it is 
possible that we delete some nodes associated with collapsed components. Note 
that the new graph G" obtained from these steps is still a component graph, 
hence it encodes a (new) nice strategy aQi . 

Therefore, Lemma 15.61 entails that, after each iteration and thus after the 
entire procedure, the strategy uqi is a winning strategy. We claim that it is 
actually a monotone winning strategy, by a simple inductive argument: if Vj is 
the current node, after the execution of steps (i)-(iv), ctc is a monotone winning 
strategy for the game starting at the configuration Vj. Then, the claim follows 
because, for j = [N'[, it means that ac is a monotone winning strategy for the 
whole game starting at the root. The base case is when the algorithm starts at 
J = 1, and hence the statement holds because the first position in is occupied 
by some leaf, which is a capture configuration of the winning strategy. Now 
assume that the statement holds for j — 1, and consider the execution of the above 
procedure on node Vj. Note that the proposed transformation deals with just 
one (possibly new) component instead of the strictly smaller Cj ; everything 
else in the strategy does not change, in particular no node preceding Vj in the 
topological order is affected by the transformation. Then, the monotonicity of 
the strategy on the game starting at Vj immediately follows from the induction 
hypothesis and from Lemma [5751 (l). which says that ED(t;^ , Mg) = and hence 
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that this move is monotone, so that C" C Cj, for each [C^ Mij^i\-option C" . 

Because each iteration in feasible in polynomial time, it just remains to show 
that the whole procedure requires at most polynomially many iterations. To this 
end, note that whenever some node vj encodes a non-monotone move, one node 
v'j is added to N' for each parent Vp of vj. Indeed, the node Vj is considered 
again after the first iteration where it was evaluated, if it still has incoming 
arcs (see step (iv)). However, after steps (i)-(iv), ac' is a monotone winning 
strategy for the game starting at the new configuration v'j. Therefore, no new 
node will be subject to further transformations in subsequent iterations along 
the given topological ordering of N'. It follows that the number of iterations of 
the described procedure is bounded by nodes {Gc{ct)) x Maxin, where Maxin is 
the largest in-degree over the nodes of Gc(o-). Thus, the number of iterations is 
bounded by a polynomial in the size of the strategy graph of the greedy winning 
strategy, which is in its turn polynomial in the size of (7^1,7^2)- 

Finally, from the monotone winning strategy ac encoded by the output G" 
of the above procedure, a tree projection Ha of {'Hi,'H2) is immediately avail- 
able. Just define nodes{'Ha) = nodes{'Hi) and edges{Ha) = {M \ aa'{v) = 
{h^M) for some configuration v in the domain of ac}- See [30] . for more de- 
tail about such a relationship between monotone strategies and tree projec- 
tions. □ 

With the above result in place, let Cgtp denote the class of all pairs {Q, V) 
such that there exists a greedy winning strategy a for the Captain in the game 
R&C('Hq, "Hv)- As shown in the proof of Theorem l5.14( based on ct a tree pro- 
jection of Hq w.r.t. Hv, which we call greedy tree projection^ can be computed 
in polynomial time. Therefore, the following is immediately established. 

Corollary 5.16 Cgtp is an island of tractability. 

5.4 Captain vs Marshal 

A related class of tractable pairs has been defined in in terms of the Robber 
and Marshal game played by one Marshal and the Robber on the hypergraphs 
{'Hi, 712)- This game has been originally defined on a single hypergraph to 
characterize hypertree decompositions [25 , and its natural extension to pairs of 
hypergraphs has been defined and studied in [4]. The game is as follows. The 
Marshal may control one hyperedge of H2, at each step. The Robber stands on 
a node and can run at great speed along hyperedges of Hi; however, (s)he is 
not permitted to run through a node that is controlled by the Marshal. Thus, 
a configuration is a pair {h,C), where h is the hyperedge controlled by the 
Marshal, and C is an [/i]-component where the Robber stands. Let (hp,Cp) 
be a configuration. This is a capture configuration, where the Marshal wins, if 
Cp C hp. Otherwise, the Marshal moves to another hyperedge h^ G edges {H2)', 
while (s)he moves, the Robber may run through those nodes that are left by 
the Marshal or not yet occupied. Thus, the Robber selects an [/i^J-component 
Cr such that Cr U Cp is [hp n ft,r]-connected. We say that the Marshal has a 
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winning strategy if, starting from the initial configuration (0,7\A), (s)he may end 
up the game in a capture position, no matter of the Robber's moves. A winning 
strategy is monotone if the Marshal may monotonically shrink the set of nodes 
where the Robber stands. 

Because only nodes in the frontier are actually used at each step in the 
monotone Robber and Marshal game, the monotone variants of the above two 
games clearly define the same hypergraph properties. 

Fact 5.17 The following are equivalent: 

(1) There is a monotone winning strategy for the Marshal in the Robber and 
Marshal game on {'Hi^'H2)- 

(2) There is a monotone winning greedy- strategy for the Captain in the Robber 
and Captain game on {Hi, 1^2) ■ 

Let Crm denote the class of all pairs {Q, V) such that there exists a monotone 
winning strategy for the Marshal on {Hq,Hv)- From the results in Crm is 

an island of tractability as well. However, the set of tractable instances identified 
by greedy winning strategies in the Robber and Captain game properly includes 
this class. The reason is that greedy winning strategies are allowed to be non- 
monotone. 

Theorem 5.18 Crm C Cgtp- 

Proof. Because greedy strategies are not required to be monotone, Crm ^ Cgtp 
follows from Fact 15.171 For the proper inclusion, just consider again Exam- 
ple 15.81 The pair of hypergraphs shown in Figure [7] is such that the Marshal 
has no monotone winning strategy, while the Captain has a (non-monotone) 
winning greedy strategvrn □ 

For completeness, recall that the non-monotone variant of the Marshal and 
Robber game is instead too powerful to be useful. Indeed, there are pairs of 
hypergraphs where the Marshal has a non-monotone winning strategy but no 
tree projection exists. We refer the interested reader to [4: for more detail about 
the monotonicity gap in the Robber and Marshal game, and to [32' for a measure 
of distance between non-monotone strategies in the Robber and Marshal game 
and tree projections. 

5.5 Greedy Decomposition Methods 

The tractability result about the general case of greedy tree projections can be 
immediately applied to every structural decomposition method, in order to get 
new tractable variants of these methods. 

Recall from Definition 14.11 that a structural decomposition method DM is 
a pair of polynomial-time computable functions w-DM and d-DM that, given a 

^^This example is in fact inspired by a similar simpler pair of hypergraphs where no mono- 
tone strategy for the Marshal exists, described in [4]. 
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Figure 12: Examples in the prool ol Fact 15.201 



conjunctive query Q and a database DB', compute a view system V = w-DM(Q) 
and a database DB" — c?-DM((5, DB') over the vocabulary of V that may be used 
to answer Q on DB'. In particular, the decompositions of Q according to DM 
are tree projections of T-Lq w.r.t. T-Ly. Then, it is natural to consider the greedy 
variant of any structural decomposition method DM, denoted by greedy-DH, whose 
associated decompositions are the greedy tree projections of T-Lq w.r.t. 

From Corollary 15.161 every decomposition method, possibly an intractable 
one such as the generalized hypertree decomposition method, defines an island 
of tractability by means of its greedy variant. 

Fact 5.19 Let DM be a structural decomposition method and let greedy-XM be its 
greedy variant. Then, the class of all queries having a greedy-DH decomposition 
is recognizable in polynomial time, and every query in the class may be evaluated 
in polynomial time over any given database. 

For a notable example, consider the method based on generalized hypertree 
decompositions. Let fc > 1. Recall that the width-fc generalized hypertree de- 
compositions of a query Q are the tree projections of {TLq,'Hq), as the view 
set v-hwk{Q) contains one distinct view over each set of variables that can 
be covered by at most k query-atoms. Then, the width-fc greedy hypertree- 
decompositions (we omit "generalized" , for short) of Q are the greedy tree pro- 
jections of {TLq, T~Lq)- Accordingly, the greedy (generalized) hypertree-width of Q , 
denoted by gr-hw, is the smallest k such that Q has a greedy hypertree decom- 
position. In fact, this greedy variant provides a new tractable approximation of 
the (intractable) notion of generalized hypertree decomposition, which is better 
than (standard) hypertree decompositions. 

Fact 5.20 For any query Q, ghw{Q) < gr-hw{Q) < hw{Q) holds. Moreover, 
there are queries Q for which gr-hw{Q) < hw{Q), even for gr-hw{Q) — 2. 

Proof. The first relationship is immediate: in the first inequality we use the 
fact that greedy hypertree decompositions are a special case of generalized hy- 
pertree decompositions, while the second inequality holds because the notion of 
hypertree decomposition is characterized by the monotone Robber and Marshals 
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game, played on Hq by a Robber and k Marshals [2S]. This game is equiva- 
lent to play the monotone game with one Marshal on the pair of hypergraphs 
{'Hq,Hq), which is the same as playing the monotone Robber and Captain 
game. 

For the strict upper bound gr-hw(Q) < hw{Q), consider the query Qo, taken 
from [m [53] , whose hypergraph Hq^ is depicted in the left part of Figure [TH 
For this query, it is shown in [26] that hw{Qo) = 3 and ghw{Qo) = 2. However, 
gr-hw{Qo) = 2 holds. Indeed, there is a winning greedy strategy for the Captain 
in the game played on ('Hqo,Hq^), as shown in the central part of Figure [T^ 
and thus there exists a greedy tree projection of HQg w.r.t. "Hq^. In the figure, 
the set of selected cops at each step is underlined in such a way that the reader 
may identify the original pair of hyperedges from that forms the chosen 
squad in Hq^ . Note that the strategy is non- monotone, as it is witnessed by the 
right branch where the Robber can return on the node B. However, by using 
the construction in Theorem l5.14[ it can be turned into a monotone (while not 
greedy) one, by removing the escape door B in the first move of the Captain 
(see the right part of the figure) . From the monotone strategy, we immediately 
get the desired tree projection. □ 

More general examples are given by the subedge-based decomposition meth- 
ods, defined in [55]. Recall that a subedge-method DM is based on a function / 
associating with each integer fc > 1 and each hypergraph Hq = (F, E) of some 
query Q a set fCHg^k) of subedges of Tig, that is, a set of subsets of hyperedges 
in E. Moreover, the set of width-fc DM-decompositions of Q can be obtained as 
follows: (1) obtain a hypertree decomposition HD oiT-Lf — {V, EL)f{H, k)), and 
(2) convert HD into a generalized hypertree decomposition of Hq by replacing 
each subedge h G /{TLq, k)\E occurring in HD by some hyperedge h' ^ E such 
thatft. C h' (which exists because ft. is a subedge). 

Because such a method is based on width-A: hypertree decompositions, in 
the tree projection framework it can be recast as follows. A width-fc DM- 
decomposition is any tree decomposition of Hq w.r.t. T-Oj associated with some 
monotone winning strategy of the Robber and Marshal game on this pair of 
hypergraphs. On the other hand, according to its greedy variant greedy-XM, the 
width-A; decompositions are the greedy tree projections of Hq w.r.t. H'j. It 
follows that the greedy variant of this method is more powerful, in general. 

Fact 5.21 Let DM be any subedge-based decomposition method. Let k > 1 and 
let Q be a query. Then, a width-k DH- decomposition of Q exists only if a width-k 
greedy-DH- decomposition of Q exists. The converse does not hold, in general. 

Proof. The first entailment follows from Theorem 15.181 The fact that the 
converse does not hold in general, follows from Fact 15.201 because the hyper- 
tree decomposition method is a subedge-based method (based on the function 
fiHQ,k)^9). □ 

This is a remarkable result, as in [26] some examples of subedge-based decom- 
position methods, such as the component hypertree decompositions, are shown to 
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generalize most previous proposals of tractable structural decomposition meth- 
ods, such as hypertree and spread-cut decompositions (in fact, all of them, 
but the approximation of fractional hypertree decomposition, later introduced 
in [10] )■ From Fact 15.211 their greedy variants are even more powerful. 

6 Tractability of Tree Projections over Small 
Arity Structures 

In this (light) section, we consider the case of relational structures having small 
arity, which is a relevant special case in real- world applications. 

In fact, observe that any variable that is not involved in any join operation 
in a conjunctive query (that is, any variable that occurs in one atom only) is 
irrelevant and may be projected out in a preprocessing phase. It follows that 
the effective arity to be considered in our structural techniques is actually de- 
termined by the largest number of variables that any atom has in common with 
other atoms (i.e., those variables involved in join operations), independently of 
the arity of the relations in the original database schema. This number is often 
small, in practiced 

Therefore, it is interesting to investigate whether the general problem of 
computing a tree projection of a pair of hypergraphs is any easier in the case 
of small arity structures (for the sake of presentation, we just consider here the 
standard structure arity, leaving to the interested reader the straightforward 
extension to the above mentioned "effective arity"). We next show that the 
problem is indeed in polynomial-time for bounded-arity structures, and it is 
moreover fixed-parameter tractable (FPT), if the arity is used as a parameter of 
the problem. This is not difficult to prove, but it was never stated before (as far 
as we know), and we believe it is important to pinpoint this tractability result. 

Recall that a problem is FPT if there is an algorithm that solves the problem 
in fixed-parameter polynomial-time, that is, with a cost f{k)0{rfi^^^), for some 
computable function / that is applied to the parameter k only. In other words, 
this algorithm not only runs in polynomial time if k is bounded by a fixed 
number, but it also exhibits a "nice" dependency on the parameter, because k 
is not in the exponent of the input size n. Let p- TP be the problem of computing 
a tree projection of T-Lq w.r.t. T-L-\>, for a given pair (Q, V), parameterized by the 
maximum arity of the relations occurring in (Q, V). 

Theorem 6.1 The problem p-TP is fixed-parameter tractable. 

Proof. Let (Q,V) be an input pair for p-TP, let {T-Lq,TLv) be the pair of 
associated hypergraphs, and let k be the parameter. 

^^In fact, it is easy to further generalize this hne of reasoning, by considering as "effective 
arity" the maximum cardinality over the hyperedges in the GYO-reduct of Hq . (Recall that 
the GYO reduct of a hypergraph is obtained by iteratively removing nodes that occur in 
one hyperedge only and hyperedges included in other hyperedges, until no further removal is 
possible — see, e.g., I51|.) 
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Compute the simplicial version "Hs of the hypergraph T-Ly, that is, the hyper- 
graph having the same set of nodes as Hv, and where edges{'Hs) = {h' ^ ^ \ h' C 
h,h € edges{'Hv)}. Therefore, edges{'Hs) contains all subsets of every hyper- 
edge of Hv- Clearly, Hs can be computed in time 0(2'^ x ledgesCHv)]), and the 
tree projections of (Hq,'Hv) are the same as the tree projections of {'Hq,'Hs)- 
To conclude, observe that any tree projection of the latter pair can be com- 
puted in polynomial-time by Theorem 15.141 and the fact that, having a squad 
for every possible set of cops in any squad/hyperedge of Hv, the greedy strate- 
gies in the game R&C('Hq, Tis) are precisely the (unrestricted) strategies in the 
game R&C('Hq, Hy), which characterize the tree projections of {'Hq,'Hv)^ □ 

The above tractability result is smoothly inherited by all structural decom- 
position methods DM such that the arity of the views in w-DM is 0{f{k)) for 
some computable function / that does not depend on the size of the input. 
For instance, this is the case for the methods based on bounded (generalized 
hyper)tree decompositions, but not for fractional hypertree decompositions. In 
particular, if w is the fixed maximum width for a class of queries having bounded 
generalized hypertree width, the maximum arity of the computed views is wxk. 
Thus, if p-ghw^ denotes the problem of computing a width-w generalized hy- 
pertree decomposition of a query, parameterized by the maximum arity of the 
query atoms, we immediately get the following result. 

Corollary 6.2 The problem p-ghw^^ is fixed-parameter tractable. 

We believe that this is a useful result. Indeed, even if for queries Q hav- 
ing maximum arity k we have ghw{Q) < tw{Q) < k x ghw{Q), we know that 
the problem of evaluating queries is not fixed-parameter tractable, with respect 
to the (generalized hyper) treewidth parameter. It follows that, under usual 
fixed-parameter complexity assumptions, an exponential dependency on such 
width parameters is unavoidable, hence evaluating such queries has a cost of 
the form Oinf^""^), where w is the treewidth (or the hypertree width) and n 
is the combined size of the database and the query (which is typically largely 
dominated by the size of the database). We thus argue that employing gen- 
eralized hypertree width instead of treewidth provides an exponential saving 
in the query-evaluation time, in general, and it is convenient even for small 
arity instances. Moreover, recall that the computation of the decomposition 
depends on the hypergraph only (and not on the database) and, unlike other 
fixed-parameter algorithms, the algorithm described in Theorem 16. II is "practi- 
cal," as there are no huge constants and the dependence on the arity parameter 
is single-exponential. 

^''Note that the same relationship holds for the monotone strategies and, hence, for the 
Marshal's strategies in the Robber and Marshal game over the pair {'Hq,'Hs), as observed by 
Adler [3]. 
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7 Conclusion 



In this paper, we have fuhy characterized the power of algorithms for evakiating 
conjunctive queries (and constraint satisfaction problems) based on enforcing 
local consistency. We studied both the general framework where consistency 
is enforced over arbitrary views and the more specific cases where views are 
computed according to structural decomposition methods. These results have 
already found application to the problems of enumerating query answers [52] 
and computing optimal solutions [33] . 

In addition to the questions mentioned in the Introduction, it is worthwhile 
recalling another open question that eventually finds an answer with these re- 
sults. The question was raised in |29) . where the tree projection theorem was 
proved. Roughly, a query program P is a finite sequence of steps involving 
project, select and join operations. The relation computed in the final step is 
the result of P. The tree projection theorem states that a query program P 
solves a query Q (i.e., the result of P always coincides with the answers of Q 
over its set of output variables) if, and only if, there is a tree projection of Q 
w.r.t. the hypergraph associated with the various relations/views determined 
by P. A crucial point here is that P is a fixed program, so that the number 
of its operations does not depend on the database size. The natural question 
in [29] was therefore to ask what happens if P is allowed to contain a "semi- 
join loop," that is, a loop that is to be executed until nothing changes in the 
involved relations/ views. Is it the case that the tree projection theorem still 
holds for such programs, where the number of steps is data-dependent? The 
results in the paper provide a positive answer to this question for the setting 
of simple queries (implicitly) considered in ^29 . and, in fact, also a complete 
answer covering the general case where queries may contain more atoms over 
the same relation symbol. 

Finally, by exploiting a recent hypergraph-game characterization of tree pro- 
jections, we also identified new islands of (structural) tractability, and we pin- 
pointed the fixed-parameter tractability of tree projections and of (most) struc- 
tural decomposition methods when small arity structures are considered. We 
believe that such results may be very useful in practical applications, and we 
are currently working on direct implementations of the proposed techniques in 
real-world database management systems. 

There are still a number of interesting questions to be answered about struc- 
tural decomposition methods. For instance, even for the bounded arity case, the 
frontier of tractability for the problem of enumerating with polynomial delay the 
answers of a conjunctive query Q over a given arbitrary set of output variables 
is not known (see [311 [TD]). Moreover, in the general unbounded- arity case, the 
frontier of tractability is not known even for Boolean conjunctive queries. In 
fact, in the unbounded arity case, the notion of submodular width |41j allows 
us to identify the class of conjunctive queries that are fixed-parameter tractable 
(where the parameter is the size of the query), assuming the exponential-time 
hypothesis. As a consequence, we now have an interesting gap to be explored 
between the polynomial-time tractability of instances having bounded fractional 
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hypertree width [351 HD] and the fixed-parameter tractabihty of instances having 
bounded submodular width. 
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